Road and Traffic Signs Recognition using Support

Road and Traffic Signs Recognition using Support Vector Machines Min Shi Master thesis 2006

Nr: E3395D

DEGREE PROJECT Computer Engineering Programme

Reg number

Extent

Master of Science in Computer Engineering

E3395D

30 ECTS

Name of student

Year-Month-Day

Min Shi

2006-07-18

Supervisor

Examiner

Hasan Fleyeh

Mark Dougherty

Company/Department

Supervisor at the Company/Department

Department of Computer Engineering, Dalarna University

Hasan Fleyeh

Title

Road and Traffic Signs Recognition using Support Vector Machines Keywords

Road Sign Recognition, Support Vector Machines (SVM), Zernike Moments, Fuzzy ARTMAP, Intelligent Transportation System

Abstract Intelligent Transportation System (ITS) is a system that builds a safe, effective and integrated transportation environment based on advanced technologies. Road signs detection and recognition is an important part of ITS, which offer ways to collect the real time traffic data for processing at a central facility. This project is to implement a road sign recognition model based on AI and image analysis technologies, which applies a machine learning method, Support Vector Machines, to recognize road signs. We focus on recognizing seven categories of road sign shapes and five categories of speed limit signs. Two kinds of features, binary image and Zernike moments, are used for representing the data to the SVM for training and test. We compared and analyzed the performances of SVM recognition model using different features and different kernels. Moreover, the performances using different recognition models, SVM and Fuzzy ARTMAP, are observed.

Acknowledgement This work was supported by the Department of Computer Engineering at Dalarna University. I greatly acknowledge my supervisor, Mr Hasan Fleyeh, for motivating this work and providing many advices and help. During my Master’s studies, he taught me many knowledges related to Computer Graphic and Digital Image Processing, enabling me to lay a solid foundation for this work. I am grateful to Professor Mark Dougherty, Dr. Pascal Rebreyend and all other teachers at Dalarna University for their indoctrination. I am also grateful to all the classmates, especially Xiao Hu and Jia Cai, for their help during my studies, all the friends in Sweden for sharing and memories. Last but not the least, I would like to give my special thanks to my husband and my family for their love and support.

Min Shi

Degree Project E3395D

July 2006

Contents CHAPTER ONE INTRODUCTION .........................................................................7 TU

UT

1.1 BACKGROUND .......................................................................................................8 1.2 SYSTEM STRUCTURE .............................................................................................9 1.3 SWEDISH ROAD SIGNS ........................................................................................10 1.3.1 Properties of Road Signs .............................................................................13 1.4 SHAPE-BASED RECOGNITION ..............................................................................14 1.5 POTENTIAL DIFFICULTIES ....................................................................................15 TU

UT

TU

UT

TU

UT

TU

UT

TU

UT

TU

UT

CHAPTER TWO SUPPORT VECTOR MACHINES ...........................................19 TU

UT

2.1 INTRODUCTION ...................................................................................................20 2.1.1 Machine Learning .......................................................................................20 2.1.2 Supervised Learning ...................................................................................21 2.1.3 Learning and Generalization .......................................................................21 2.1.4 Support Vector Machines for Learning .......................................................23 2.2 LINEAR CLASSIFICATION .....................................................................................24 2.2.1 Perceptron ...................................................................................................25 2.2.2 Dual Representation ....................................................................................26 2.3 NON-LINEAR CLASSIFICATION .............................................................................27 2.3.1 Learning in Feature Space ..........................................................................28 2.3.1 Implicit Mapping to Feature Space .............................................................29 2.4 KERNEL ..............................................................................................................29 2.4.1 Kernel Matrix ..............................................................................................30 2.4.2 Properties of Kernels...................................................................................30 2.4.3 Examples of Kernel.....................................................................................31 2.4.4 Kernel Selection ..........................................................................................34 2.5 GENERALIZATION ................................................................................................35 2.5.1 VC Dimension ............................................................................................35 2.5.2 ERM VS SRM ............................................................................................36 2.6 MAXIMUM MARGIN CLASSIFIER .........................................................................37 2.7 SOFT MARGIN CLASSIFIER ..................................................................................39 2.8 MULTI-CLASS CLASSIFIER ...................................................................................40 2.9 TYPE OF SVM .....................................................................................................41 2.9.1 C-SVM Classification .................................................................................41 2.9.2 ν -SVM Classification ................................................................................42 TU

UT

TU

UT

TU

UT

TU

UT

TU

UT

TU

UT

TU

UT

TU

UT

TU

UT

TU

UT

TU

UT

TU

UT

TU

UT

TU

UT

TU

UT

TU

UT

TU

UT

TU

UT

TU

UT

TU

UT

TU

UT

TU

UT

TU

UT

TU

TU

UT

UT

TU

UT

CHAPTER THREE METHODOLOGY .................................................................43 TU

UT

3.1 INTRODUCTION ...................................................................................................44 3.2 LIBSVM ............................................................................................................45 TU

UT

TU

UT

______________________________________________________________________________________________________________________________________________________________________

Dalarna University Röda vägen 3 S-781 88 Borlänge Sweden

-1-

Tel: +46 (0)23 778000 Fax: +46 (0)23 778080 http://www.du.se

Min Shi


July 2006

3.3 DATA REPRESENTATION.......................................................................................45 3.4 DATA NORMALIZATION .......................................................................................46 3.5 BINARY REPRESENTATION ...................................................................................47 3.6 ZERNIKE MOMENTS ............................................................................................47 3.6.1 Definition of Zernike Moments ..................................................................48 3.6.2 Image Normalization ..................................................................................49 3.6.3 Moments Representation ............................................................................50 TU

UT

TU

UT

TU

UT

TU

UT

TU

UT

TU

UT

TU

UT

CHAPTER FOUR EXPERIMENTS AND ANALYSIS .........................................51 TU

UT

4.1 INTRODUCTION ...................................................................................................52 4.2 CLASSIFICATION WITH DIFFERENT FEATURES .....................................................55 4.2.1 Binary Representation.................................................................................55 4.2.2 Zernike Moments Representation ...............................................................59 4.3 CLASSIFICATION WITH DIFFERENT KERNELS AND SVM TYPES...........................64 4.4 COMPARISON OF DIFFERENT RECOGNITION MODELS ..........................................76 TU

UT

TU

UT

TU

UT

TU

UT

TU

UT

TU

UT

CHAPTER FIVE CONCLUSION AND FUTURE WORK ..................................78 TU

UT

APPENDIX A: USER MANUAL .............................................................................81 TU

UT

A.1 CONVERT IMAGE DATA TO SVM DATA ..............................................................81 A.2 CONVERT ZERNIKE DATA TO SVM DATA ...........................................................84 A.3 TRAIN AND TEST SVM WITH DEFAULT VALUES .................................................86 A.4 TRAIN AND TEST SVM WITH MANUAL ..............................................................88 A.5 TRAIN AND TEST SVM WITH FROM PARAMETER FILE .........................................89 A.6 TRAIN AND TEST SVM WITH GRID.....................................................................90 A.7 TRAIN AND TEST SVM WITH SA ........................................................................91 A.8 PREDICT AN IMAGE FILE .....................................................................................92 TU

UT

TU

UT

TU

UT

TU

UT

TU

UT

TU

UT

TU

UT

TU

UT

REFERENCES ...........................................................................................................94 TU

UT

______________________________________________________________________________________________________________________________________________________________________


-2-


Min Shi


July 2006

List of Figures FIGURE 1.1 A STRUCTURE OF ROAD SIGN DETECTION AND RECOGNITION SYSTEM ........10 FIGURE 1.2 EXAMPLES OF WARNING SIGNS ...................................................................11 FIGURE 1.3 EXAMPLES OF PROHIBITORY SIGNS .............................................................12 FIGURE 1.4 EXAMPLES OF MANDATORY SIGNS ..............................................................12 FIGURE 1.5 EXAMPLES OF SIGNS GIVING INFORMATION ................................................13 FIGURE 1.6 SHAPES FOR RECOGNITION .........................................................................14 FIGURE 1.7 SPEED LIMITS FOR RECOGNITION ...............................................................14 FIGURE 1.8 ROAD SIGN RECOGNITION MODEL ...............................................................15 FIGURE 1.9 EXAMPLES OF DIFFICULTIES [10] ................................................................17 FIGURE 2.1 BLOCK DIAGRAM OF SUPERVISED LEARNING ..............................................21 FIGURE 2.2 AN OVERFITTING CLASSIFIER AND A BETTER CLASSIFIER ............................22 FIGURE 2.3 AN OVERVIEW OF THE SVM PROCESS.........................................................23 FIGURE 2.4 TWO WAYS SEPARATE THE DATA WITH TWO CATEGORIES .............................24 FIGURE 2.5 LINEAR CLASSIFICATION FOR TWO-DIMENSIONAL INPUT VECTORS ..............25 FIGURE 2.6 A MAPPING FROM A TWO-DIMENSIONAL INPUT SPACE TO A TWO-DIMENSIONAL FEATURE SPACE ......................................................................28 FIGURE 2.7 TWO EXAMPLES OF VC DIMENSION ............................................................36 FIGURE 2.8 THE BOUND ON ACTUAL RISK OF A CLASSIFIER ...........................................37 FIGURE 2.9 AN EXAMPLE OF SOFT MARGIN CLASSIFIER.................................................40 FIGURE 3.1 A BLOCK DIAGRAM OF ROAD SIGNS CLASSIFICATION USING SVM ..............44 FIGURE 3.2 AN EXAMPLE OF BINARY REPRESENTATION OF ROAD SIGN ..........................47 FIGURE 4.1 SOME BINARY IMAGES AND THEIR CORRESPONDING CATEGORIES OF SPEED LIMIT SIGNS FOR RECOGNITION .............................................................................53 FIGURE 4.2 SOME BINARY IMAGES AND THEIR CORRESPONDING CATEGORIES OF ROAD SIGN SHAPES FOR RECOGNITION ............................................................................54 FIGURE 4.3 SOME INSTANCES OF SPEED LIMIT SIGNS 70 WITH THE FEATURE OF PEPPER NOISE. ...................................................................................................................59 FIGURE 4.4 RECONSTRUCTED IMAGE WITH DIFFERENT ORDER OF ZERNIKE MOMENTS .64 FIGURE 4.5 THE PERFORMANCES OF THE SVM MODEL USING LINEAR KERNEL AND DIFFERENT PARAMETER C FOR SPEED LIMIT RECOGNITION WITH ZERNIKE MOMENT REPRESENTATION ..................................................................................................67 FIGURE 4.6 THE SUPPORT VECTORS OF THE SVM MODEL USING LINEAR KERNEL AND DIFFERENT PARAMETER ν FOR SPEED LIMIT RECOGNITION WITH ZERNIKE MOMENT REPRESENTATION ..................................................................................................68 FIGURE 4.7 THE PERFORMANCES OF THE SVM MODEL USING LINEAR KERNEL AND DIFFERENT PARAMETER ν FOR SPEED LIMIT RECOGNITION WITH ZERNIKE MOMENT REPRESENTATION ..................................................................................................68 FIGURE 4.8 THE PERFORMANCES OF THE SVM MODEL USING RBF KERNEL AND C-SVM TU

UT

TU

UT

TU

UT

TU

UT

TU

UT

TU

UT

TU

UT

TU

UT

TU

UT

TU

UT

TU

UT

TU

UT

TU

UT

TU

UT

TU

UT

TU

UT

TU

UT

TU

UT

TU

UT

TU

UT

TU

UT

TU

UT

TU

UT

UT

TU

T

U

UT

TU

UT

TU

UT

TU

UT

TU

UT

TU

______________________________________________________________________________________________________________________________________________________________________


-3-


Min Shi


OF DIFFERENT PARAMETER UT

γ TU

July 2006

FOR SPEED LIMIT RECOGNITION WITH ZERNIKE

MOMENT REPRESENTATION ....................................................................................69 UT

FIGURE 4.9 THE PERFORMANCES OF THE SVM MODEL USING SIGMOID KERNEL AND C-SVM OF DIFFERENT PARAMETER r FOR SPEED LIMIT RECOGNITION WITH ZERNIKE MOMENT REPRESENTATION .....................................................................70 FIGURE 4.10 THE PERFORMANCES OF THE SVM MODEL USING POLYNOMIAL KERNEL AND C-SVM OF DIFFERENT PARAMETER d FOR SPEED LIMIT RECOGNITION WITH ZERNIKE MOMENT REPRESENTATION .....................................................................71 FIGURE 4.11 THE PERFORMANCES OF SA SEARCH WITH DIFFERENT COOLING RATIO ....75 FIGURE A.1 A USE CASE OF CONVERTING IMAGE DATA TO SVM DATA ..........................81 FIGURE A.2 DISPLAY THE BINARY IMAGES SELECTED FOR TEST AND THE BINARY IMAGES SELECTED FOR TRAINING .......................................................................................83 FIGURE A.3 A USE CASE OF CONVERTING ZERNIKE MOMENTS DATA TO SVM DATA ......84 FIGURE A.4 DISPLAY THE TOTAL NUMBER OF TRAINING DATA AND THE TOTAL NUMBER OF TEST DATA FOR CONVERTING ZERNIKE MOMENTS TO SVM. .............................85 FIGURE A.5 THE STEPS OF TRAINING AND TEST THE SVM MODEL WITH DEFAULT VALUES ..............................................................................................................................86 FIGURE A.6 DISPLAY THE TRAINING AND TEST RESULTS. ...............................................87 FIGURE A.7 AN ILLUSTRATIONS THE PROCESS AND THE CREATED FILES AT EVERY STEP 88 FIGURE A.8 AN ILLUSTRATIONS THE PROCESS AND THE CREATED FILES AT EVERY STEP 89 A.9 A USE CASE OF TRAINING AND TEST SVM FROM A PARAMETER FILE ......................90 FIGURE A.10 A USE CASE OF TRAIN AND TEST SVM MODEL WITH GRID SEARCH ..........91 FIGURE A.11 A USE CASE OF TRAIN AND TEST SVM MODEL WITH SA SEARCH .............92 FIGURE A.12 A USE CASE OF PREDICTING A IMAGE FILE ................................................93 FIGURE A.13 THE PROCESS OF PREDICTING WITH PROBABILITY ....................................93 TU

UT

TU

UT

TU

UT

TU

UT

TU

UT

TU

UT

TU

UT

TU

UT

TU

UT

TU

T

U

TU

UT

UT

UT

TU

T

U

T

U

UT

TU

UT

TU

UT

TU

UT

TU

UT

______________________________________________________________________________________________________________________________________________________________________


-4-


Min Shi


July 2006

List of Tables TABLE 1.1: SOME CHARACTERISTICS OF TWO WAYS FOR ROAD SIGNS DETECTION AND RECOGNITION ..........................................................................................................9 TABLE 1.2 MAIN COLORS AND SHAPES IN SWEDISH ROAD SIGNS ...................................13 TABLE 2.1 THE PRIMAL FORM OF PERCEPTRON ALGORITHM ..........................................26 TABLE 2.2 THE DUAL FORM OF PERCEPTRON ALGORITHM .............................................27 TABLE 3.1 ROAD SIGN SHAPES GROUP...........................................................................45 TABLE 3.2 SPEED LIMIT SIGNS GROUP ...........................................................................45 TABLE 3.3 THE ZERNIKE MOMENTS REPRESENTATION OF ROAD SIGN ............................50 TABLE 4.1 DESIRED OUTPUTS OF ROAD SIGN SHAPES WITH BINARY REPRESENTATION ..55 TABLE 4.2 DESIRED OUTPUTS OF SPEED LIMIT SIGNS WITH BINARY REPRESENTATION ...55 TABLE 4.3 THE CORRECT CLASSIFICATION RATE OF ROAD SIGN SHAPES ON THE TEN PAIRS OF TRAINING/TEST DATA SET WITH BINARY REPRESENTATION ................................56 TABLE 4.4 CONFUSION MATRIX OF ROAD SIGN SHAPES CLASSIFICATION WITH BINARY REPRESENTATION ON TRAINING SET .......................................................................56 TABLE 4.5 CONFUSION MATRIX OF ROAD SIGN SHAPES CLASSIFICATION WITH BINARY REPRESENTATION ON TEST SET ..............................................................................57 TABLE 4.6 THE CORRECT CLASSIFICATION RATE OF SPEED LIMIT SIGNS ON THE TEN PAIRS OF TRAINING/TEST DATA SET WITH BINARY REPRESENTATION ................................57 TABLE 4.7 CONFUSION MATRIXES OF ONE WORST PAIR OF RESULTS FOR SPEED LIMIT SIGNS CLASSIFICATION WITH BINARY REPRESENTATION ON TRAINING SET .............58 TABLE 4.8 CONFUSION MATRIXES OF ONE WORST PAIR OF RESULTS FOR SPEED LIMIT SIGNS CLASSIFICATION WITH BINARY REPRESENTATION ON TEST SET .....................58 TABLE 4.9 TWO EXAMPLES OF ROAD SIGN IMAGES THAT WERE CLASSIFIED INCORRECTLY WITH BINARY REPRESENTATION. ............................................................................58 TABLE 4.10 DESIRED OUTPUTS OF ROAD SIGN SHAPES WITH ZERNIKE MOMENTS REPRESENTATION. .................................................................................................59 TABLE 4.11 DESIRED OUTPUTS OF SPEED LIMIT SIGNS WITH ZERNIKE MOMENTS REPRESENTATION ..................................................................................................59 TABLE 4.12 THE CORRECT CLASSIFICATION RATE OF ROAD SIGN SHAPES ON THE TEN PAIRS OF TRAINING/TEST DATA SET WITH ZERNIKE MOMENTS REPRESENTATION ....60 TABLE 4.13 CONFUSION MATRIXES OF THE WORST PAIR OF RESULTS FOR ROAD SIGN SHAPES CLASSIFICATION WITH ZERNIKE MOMENTS REPRESENTATION ON TRAINING SET ........................................................................................................................61 TABLE 4.14 CONFUSION MATRIXES OF THE WORST PAIR OF RESULTS FOR ROAD SIGN SHAPES CLASSIFICATION WITH ZERNIKE MOMENTS REPRESENTATION ON TEST SET. ..............................................................................................................................61 TABLE 4.15 THE CORRECT CLASSIFICATION RATE OF SPEED LIMIT SIGNS ON THE TEN PAIRS OF TRAINING/TEST DATA SET WITH ZERNIKE MOMENTS REPRESENTATION ....62 TU

UT

TU

UT

TU

UT

TU

UT

TU

UT

TU

UT

TU

UT

UT

T

U

UT

TU

TU

UT

TU

UT

TU

UT

TU

UT

TU

UT

TU

UT

TU

UT

TU

UT

TU

UT

TU

UT

TU

UT

TU

UT

TU

UT

______________________________________________________________________________________________________________________________________________________________________


-5-


Min Shi


July 2006

TABLE 4.16 CONFUSION MATRIX OF THE WORST TRAINING RESULT FOR SPEED LIMIT SIGNS CLASSIFICATION WITH ZERNIKE MOMENTS REPRESENTATION ......................62 TABLE 4.17 CONFUSION MATRIX OF THE WORST TEST RESULT FOR SPEED LIMIT SIGNS CLASSIFICATION WITH ZERNIKE MOMENTS REPRESENTATION ................................63 TABLE 4.18 SOME INSTANCES OF ROAD SIGNS THAT WERE CLASSIFIED INCORRECTLY WITH ZERNIKE MOMENTS REPRESENTATION. .........................................................63 TABLE 4.19 THE CORRECT CLASSIFICATION RATE OF ROAD SIGN SHAPES USING DIFFERENT KERNELS AND SVM TYPES WITH BINARY REPRESENTATION ................65 TABLE 4.20 THE CORRECT CLASSIFICATION RATE OF SPEED LIMIT SIGNS USING DIFFERENT KERNELS AND SVM TYPES WITH BINARY REPRESENTATION ................65 TABLE 4.21 THE CORRECT CLASSIFICATION RATE OF ROAD SIGN SHAPES USING DIFFERENT KERNELS AND SVM TYPES WITH ZERNIKE MOMENTS REPRESENTATION ..............................................................................................................................66 TABLE 4.22 THE CORRECT CLASSIFICATION RATE OF SPEED LIMIT SIGNS USING DIFFERENT KERNELS AND SVM TYPES WITH ZERNIKE MOMENTS REPRESENTATION ..............................................................................................................................66 TABLE 4.23 THE CONTRASTS OF WITH AND WITHOUT GRID SEARCH FOR ROAD SIGN SHAPES CLASSIFICATION USING RBF KERNEL WITH ZERNIKE MOMENTS REPRESENTATION ..................................................................................................72 TABLE 4.24 THE CONTRASTS OF WITH AND WITHOUT GRID SEARCH FOR SPEED LIMIT SIGNS CLASSIFICATION USING RBF KERNEL WITH ZERNIKE MOMENTS REPRESENTATION ..................................................................................................73 TABLE 4.25 PROBABILITY OF ACCEPTANCE FOR T=0.1 AND EVAL(VC)=0.5 ...................74 TABLE 4.26 THE CONTRASTS OF GRID SEARCH AND SA SEARCH FOR ROAD SIGN SHAPES CLASSIFICATION USING RBF KERNEL WITH ZERNIKE MOMENTS REPRESENTATION 75 TABLE 4.27 THE CONTRASTS OF GRID SEARCH AND SA SEARCH FOR SPEED LIMIT SIGNS CLASSIFICATION USING RBF KERNEL WITH ZERNIKE MOMENTS REPRESENTATION 76 TABLE 4.28 BEST CLASSIFICATION RESULTS FROM DIFFERENT RECOGNITION MODELS .77 TABLE A.1 THE FORMAT OF IMAGE LIST FILE.................................................................82 TABLE A.2 THE FORMAT OF ZERNIKE MOMENTS DATA FILE ...........................................84 TABLE A.3 THE FORMAT OF PARAMETER FILE ...............................................................89 TU

UT

TU

UT

TU

UT

TU

UT

TU

UT

TU

UT

TU

UT

TU

UT

TU

UT

TU

UB

UB

UT

TU

UT

TU

UT

UT

TU

T

U

UT

TU

UT

TU

UT

______________________________________________________________________________________________________________________________________________________________________


-6-


Min Shi


July 2006

Chapter One Introduction

______________________________________________________________________________________________________________________________________________________________________


-7-


Min Shi


July 2006

1.1 Background Nowadays technology development has made that it is possible to drive a robotic vehicle automatically on roads. An intelligent transportation system (ITS) is a general term for a wide range of technologies incorporated into traditional transportation infrastructure and vehicles. These systems can include roadway sensors, in-vehicle navigation services, electronic message signs, and traffic management and monitoring. ITS technologies are being widely deployed to maximize transportation safety and efficiency[1]. It aims to manage factors that are typically at odds with each other such as vehicles, loads, and routes to improve safety and reduce vehicle wear, transportation times and fuel costs[2]. Intelligent Transportation Systems vary in technologies applied, from basic management systems such as car navigation, traffic light control systems, container management systems, variable message signs or speed cameras to monitoring applications such as security CCTV systems, and then to more advanced applications which integrate live data and feedback from a number of other sources, such as real-time weather, bridge deicing systems, and the like. Additionally, predictive techniques are being developed, to allow advanced modeling and comparison with historical baseline data[2]. Road signs detection and recognition is an important part of ITS, which offer ways to collect the real-time traffic data for processing at a central facility. It can be realized in two different ways: Information communication technologies based on distributed and pervasive applications. Intelligent detection and recognition based on artificial intelligence and image analysis. Some characteristics of both methods for road signs detection and recognition have been shown in table 1.1.

______________________________________________________________________________________________________________________________________________________________________


-8-


Min Shi


July 2006

Table 1.1: Some characteristics of two ways for road signs detection and recognition Ways Properties Data Type Data receiver Environment Technologies Basic services Accuracy Feasibility

Information Communication Digital signal Sensor and software interface Soft environment Distributed and pervasive application based on wireless communication Utilize network-based services Refined classification Immature

Intelligent Detection and Recognition Physical signal to digital signal Digital camera Hard environment AI and image analysis None Coarse classification Feasible

Despite that the application of information communication can implement refined recognition of road signs, its implementation is dependent on the related facilities and services, for example network service, road condition data, road sign sensor and so forth. However, the implementation of information detection and recognition using AI and image analysis technologies is to mount a digital camera on each vehicle. In complex real-time environment, none of these two methods can be replaced completely by another. Actually, the combination of both methods makes the system more stable and reliable to provide higher security. This project is to implement road sign recognition based on AI and image analysis technologies, which applies a machine learning method, Support Vector Machines, to recognize road sign with two kinds of digital signals, binary image and Zernike moments.

1.2 System Structure When vehicles are drived on public roads, the rules of roads must be obeyed. Some of these rules are delivered through the use of road signs. So, an autonomous vehicle must have the capabilities of detecting and recognizing road signs and adjusting its behavior accordingly.

______________________________________________________________________________________________________________________________________________________________________


-9-


Min Shi


July 2006

Road Image Capturing Detection Model

Color Segmentation And Shape Analysis Extract Road Sign

Feature Values Recognition Model Classification Machine Validation Figure 1.1 A structure of road sign detection and recognition system Figure 1.1 shows a structure of road sign detection and recognition system based on AI and image analysis technologies. Assuming that a digital camera is mounted in the front of a vehicle, it is used to take pictures on roads. These pictures are transferred into the system. Color thresholding is used for road sign detection to segment the image and shape analysis. If a road sign is detected in an image, only the road sign part of this image is kept. Before a road sign image is inputted into a trained learning machine to be recognized, the feature values of this image have to be calculated. And then the learning machine outputs a value that implies a road sign. In this project, a recognition model is implemented, which constructs the learning machine using a new pattern recognition technology, Support Vector Machines. For comparing and analyzing the effects of training the SVM with different feature values, two kinds of feature value, binary image and Zernike moments, will be used to train the test the SVM.

1.3 Swedish Road Signs Road signs are designed to be easily recognized by human drivers mainly because their color and shapes are very different from natural environments[3]. Swedish road administration is in charge of defining the appearance of all signs and road markings in Sweden. They divide the road signs into four different classes. ______________________________________________________________________________________________________________________________________________________________________


- 10 -


Min Shi


July 2006

Warning signs. A traffic warning sign is a type of traffic sign that indicates a hazard ahead on the road [4]. In Sweden, it is an equilateral triangle with a thick red border and a yellow background because it is easier to see a red/yellow sign in the snowy weather. Some other signs like the distance to level crossing signs and track level crossing are also belong to this class. Prohibitory signs. Prohibitory traffic signs are used to prohibit certain types of maneuvers or some types of traffic. For example, no entry sign, no parking sign, speed limits sign and so on. Normally, they are designed in the shape of circle with a thick red border and a yellow background. There are few exceptions; the international standard stop sign is an octagon with red background and white border and the NO PARKING and NO STANDING signs that have a blue background instead of yellow. The ending restriction signs are marked with black bars. Mandatory signs. They are always round blue signs with white border. They control the actions of drivers and road users. Signs ending obligation have a red slash. Signs giving information. The Swedish diamond shaped and rectangle shaped signs are the signs informing about priority road, including the services along the road. These signs are normally with green, yellow, white, blue, black background.

Figure 1.2 Examples of warning signs

______________________________________________________________________________________________________________________________________________________________________


- 11 -


Min Shi


July 2006

Figure 1.3 Examples of prohibitory signs

Figure 1.4 Examples of mandatory signs

______________________________________________________________________________________________________________________________________________________________________


- 12 -


Min Shi


July 2006

Figure 1.5 Examples of Signs giving information

1.3.1 Properties of Road Signs The above examples (figure 1.2-1.5) have shown that colors and shapes are basic characteristics of road signs. Road signs are designed, manufactured and installed according to tight regulations. They are designed in fixed 2-D shapes like triangles, circles, octagons, or rectangles. The colors of the signs are chosen to be far away from the environment, which make them easily recognizable by the drivers[5]. There are mainly seven different shapes and seven different colors in Swedish road signs, which have been shown in table 1.2. The road sign detection and recognition system can be implemented by either color information, shape information or both of them. Combining color information and shape information may give better results[5]. Table 1.2 Main colors and shapes in Swedish road signs Colors White Yellow Orange Red Green Blue Black

Shapes Rectangle Diamond Circle Octagon Cross Buck Upward Triangle Downward Triangle

______________________________________________________________________________________________________________________________________________________________________


- 13 -


Min Shi


July 2006

1.4 Shape-based Recognition In recent years there are has been a surge of papers describing road sign recognition methods. One of the points supporting the use of shape information for road signs recognition is the lack to standard colors among the countries. Systems rely on colors need to be tuned by moving from one country to another. The other point in this argument is the fact that colors vary as daylight and reflectance properties vary. In situations in which it is difficult to extract color information such as twilight time and nighttime, shape detection will be a good alternative[5]. This project focuses on recognizing seven categories of road signs (figure 1.6) and five speed limit signs (figure 1.7). Since comparing with other categories these road signs are more important and more difficult to be classified by computers.

Figure 1.6 Shapes for recognition

Figure 1.7 Speed limits for Recognition All of these road signs will be recognized by shape information only. In other words, the color properties of the road signs will be ignored during classification process. Lots of road sign samples will be used for training the learning machine, while new road sign samples will be used to verify this machine.

______________________________________________________________________________________________________________________________________________________________________


- 14 -


Min Shi

Binary Image


Feature Model

Classification Model

Zernike Moments

Trained SVM

July 2006

Output Matched Class

Figure 1.8 Road sign recognition model The binary image of the road sign is extracted as one of its features to train and test the learning machine. Feature model is optional in the recognition model; in this project Zernike moments are used to select the features of a binary image (figure 1.8). Detailed process of the recognition using SVM will be discussed in the chapter three.

1.5 Potential Difficulties Identification of traffic signs correctly at the right time and the right place is very important for car drivers to insure themselves and their passengers’ safe journey. However, sometimes, due to the change of weather conditions or viewing angles, traffic signs are difficult to be seen until it is too late[6]. These potential factors, therefore, bring following or more difficulties for road sign detection and recognition. The color of the sign fades with time as a result of the long exposure to the sun light[7], and signs may be damaged (figure 1.9 a-b). Obstacles, such as trees, poles, buildings, and even vehicles and pedestrians, may occlude or partially occlude road signs[8] (figure 1.9 c). Video images of road signs often suffer from blurring in view that the camcorder is mounted on a moving vehicle[8] (figure 1.9 d). Lighting conditions are changeable and not controllable. Lighting is different according to the time of the day, season, cloudiness and other weather conditions, etc[9] (figure 1.9 e-g). Models of all the possibilities of the sign’s appearance are not possible to generate off-line, because there are so many degrees of freedom[9] (figure 1.9 h). ______________________________________________________________________________________________________________________________________________________________________


- 15 -


Min Shi


July 2006

The presence of objects similar in color and/or shapes to the road signs in the scene under consideration, like buildings, or vehicles[7] (figure 1.9 i). It’s wrong to recognize the road signs that do not belong to your way (figure 1.9 j).

(a) Faded sign

(b) Damaged sign

(c) Partial occlusions

(d) Blurry image

(e) Bad light

(f) Shadows

______________________________________________________________________________________________________________________________________________________________________


- 16 -


Min Shi


(g) Bad weather

July 2006

(h) Shape deformation

(i) Object with similar color (j) Identification of wrong signs Figure 1.9 Examples of difficulties [10] Many methods have been proposed to overcome some of these difficulties. Escalera et al., [11], deal with road signs recognition in the environments that lighting conditions cannot be controlled and predicted, objects can be partially occluded, and their position and orientation is not known a priori. A genetic algorithm is used for the detection step, allowing invariance localization to changes in position, scale, rotation, weather conditions, partial occlusion, and the presence of other objects of the same color. A neural network achieves the classification. In [5, 12-16] Hasan Fleyeh proposed some other methods. For example, in [16] he used a new algorithm for traffic signs color detection and segmentation in poor light conditions. The RGB channels of the road images are enhanced separately by histogram equalization, and then to extract the true colors of the sign by a color constancy method. The resultant image is then converted into HSV color space, and segmented to extract the colors of the road signs. In [15], Hasan Fleyeh developed a new color detection and segmentation algorithm for road signs in which the effect of shadows and highlights are neglected to get better ______________________________________________________________________________________________________________________________________________________________________


- 17 -


Min Shi


July 2006

color segmentation results. The RGB images of road signs are converted into HSV color space and the shadow-highlight invariant method is applied to extract the colors of the road signs under shadow and highlight conditions.

______________________________________________________________________________________________________________________________________________________________________


- 18 -


Min Shi


July 2006

Chapter Two Support Vector Machines

______________________________________________________________________________________________________________________________________________________________________


- 19 -


Min Shi


July 2006

2.1 Introduction Support Vector Machines (SVM) is a kind of machine learning methods based on mathematical foundations of statistical learning theory, which was proposed first in 1992 by Vapnik. In the last few years there have been very significant developments in the theoretical understanding of SVM as well as algorithmic strategies for implementing them, and applications of the approach to practical problems[17].

2.1.1 Machine Learning Machine learning is the technique of computer algorithms that allows computers to learn automatically from their experience[18]. These experiences include data observation, statistic, analysis and some other means, which results in that a system has capacities of self-improvement and therefore acquires knowledges in certain fields. Learning is intelligent processes of acquiring knowledges. There are several parallels between animal learning and machine learning. In intelligent system, some techniques derive from the biological research, through computational models to make more precise theories of animal learning. In general there are two kinds of learning, one is inductive learning and the other is deductive learning. Inductive machine learning methods extract rules and patterns from a massive of data sets[19]. These rules could be applied to new data sets, but there is no guarantee that the rules will be correct[20]. Normally, the correct rate is calculated as one of criteria to evaluate the efficiency of the rules. For example, given ten features of a human, an inductive learning system might infer that any animals with part features of a human are human. Deductive learning methods learn from a set of known facts and rules to product additional rules that are guaranteed to be true[20]. Given rules that “All men are mortal” and “Jones is a man”, a deductive learning system can deduct that “Jones is mortal”. Machine learning usually perform tasks associated with Artificial Intelligence (AI). Such tasks involve recognition, diagnosis, planning, robot control, prediction, etc[21]. As an important field of Artificial Intelligence, it has been increasingly successful in real-world applications such as data mining, medical diagnosis, search engines, speech and image recognition etc. This paper presents a machine learning method to perform an image recognition task that is to predict which image sign respond to which road sign. This method called support vector machines is an inductive learning method, more precisely, a supervised learning method, which a group of data sample is inputted into the system for training ______________________________________________________________________________________________________________________________________________________________________


- 20 -


Min Shi


July 2006

the machine and another group of data sets are tested to verify the system.

2.1.2 Supervised Learning Supervised learning is a machine learning technique for creating a function from training data. The training data consist of pairs of input objects (typically vectors), and desired outputs. The task of the supervised learner is to predict the value of the function for any valid input object after having seen a number of training examples (i.e. pairs of input and target output). To achieve this, the learner has to generalize from the presented data to unseen situations in a "reasonable" way[22]. Figure 2.1 shows a block diagram of supervised learning. The pairs of input/output describe the state of the environment. Assuming that the teacher and the learning system both draw a training vector from the environment, and the teacher provides the learning system with a desired response for that training vector. The parameters of the learning system are adjusted according to the error signal. The adjustment is carried out iteratively until the learning system can emulate the teacher and is able to deal with the environment without the teacher.

Figure 2.1 Block diagram of supervised learning

2.1.3 Learning and Generalization Early machine learning algorithms aimed to learn representations of simple symbolic functions that could be understood and verified by experts[23], it is to find an accurate ______________________________________________________________________________________________________________________________________________________________________


- 21 -


Min Shi


July 2006

fit to the training data. However, the overfitting objective function makes essentially uncorrelated predictions on unseen data. Generalization is the ability to classify previously the data not in the training set. It allows the learning system to cope with examples that have not seen before and hence to be more flexible and useful. Figure 2.2 shows a binary classification problem to illustrate the difference between overfitting classifier and a better classifier. Filled squares and triangles are the training data while hollow squares and triangles are the test data. The test accuracy of the classifier in figures 2.2 (a) and (b) is not good since it overfits the training data. While a better classifier in figure 2.2 (c) and (d) used generalization theory gives a better test accuracy. More detail about generalization will be discussed in the section 2.5.

Figure 2.2 An overfitting classifier and a better classifier

______________________________________________________________________________________________________________________________________________________________________


- 22 -


Min Shi


July 2006

2.1.4 Support Vector Machines for Learning Support vector machines are learning systems that use a hypothesis space of linear functions in a high dimensional feature space, trained with a learning algorithm from optimization theory that implements a learning bias derived from statistical learning theory (Cristianini and Shawe-Taylor, 2000). SVM can perform pattern recognition (also known as classification) tasks by building decision boundaries that optimally separates the data into categories and perform real valued function approximation (also known as regression) tasks by constructing a function that best interpolates a given data set.

Figure 2.3 An overview of the SVM process An overview of the SVM classification process is shown in figure 2.3. A set of data with predictor variables called attributes is presented in an input space. Since these data cannot be separated by linear function in the input space, so these attributes of the data are transformed and presented into a feature space; in other words each data in the input space has a mapping in the feature space. The goal of the transformation is to implement the separation more easily. A set of features that describes one case is called a vector. In the feature space a hyperplane that divides clusters of vector could be found, so that the data with one category of the target variable are on one side of the plane and the data with another category are on the other side of the plane.

______________________________________________________________________________________________________________________________________________________________________


- 23 -


Min Shi


July 2006

Figure 2.4 Two ways separate the data with two categories Besides of separating the data into different categories, the objective of SVM is to find an optimal hyperplane that correctly classifies the data as much as possible and separates the data as far as possible. Figure 2.4 shows two ways that separate the data with two categories, one is represented by filled circle and the other is represented by hollow circle. The broken lines mark the boundaries that run parallel to the separating line and the closest vectors to the line. The distance between two lines is called margin, while the vectors marked with hollow square are the support vectors, they constrain the width of the margin. To define the optimal hyperplane the SVM analyze and find the hyperplane that is to maximize the margin. Because of the nature of the feature space in which these boundaries are found, Support Vector Machines can exhibit a large degree of flexibility in handling classification tasks of varied complexities. Several types of SVM models including linear, polynomial, radial basis function, and sigmoid will be introduced in this paper. SVM models works very similarly to classical neural networks. Actually, a SVM model using a sigmoid kernel function is equivalent to a two-layer, feed forward neural network. However, comparing with traditional neural network approaches, the generalization theory of SVM enables the models to avoid overfitting the data.

2.2 Linear Classification Linear functions are the most understood and simplest methods to be applied into machine learning. This section introduces the basic thinking for solving linear classification problems. The goal of introducing linear classification here is to present the non-linear classification that will be introduced in the next section. Linear classification is normally performed by using a linear function of its input vectors. This function can be written as ______________________________________________________________________________________________________________________________________________________________________


- 24 -


Min Shi


July 2006

n

f (x) = w ⋅ x + b = ∑ wi xi + b

(2.1)

i =1

where xi is the ith attribute value of an input vector x , wi is the weight value for the attribute xi and b is the bias. For a binary classification, the decision rule is given by sgn( f (x)) : the input vector x = ( x1 , L, x n ) ' is assigned into the positive

class if f (x) ≥ 0 , otherwise into the negative class.

Figure 2.5 linear classification for two-dimensional input vectors Figure 2.5 plots the interpretation of linear classification for two-dimensional input vectors. The input space is divided into two parts by a bold line called hyperplane. n

The hyperplane is defined by the equation f (x) = ∑ wi xi + b = 0 ; the region above i =1

the hyperplane belongs to positive class while the region below the hyperplane belongs to negative class. The weight vector w defines the slope of the hyperplane and b defines the bias between the hyperplane and the origin of the input space. Therefore a (n − 1) -dimensional hyperplane can separate a n -dimensional space and (n + 1) parameters are used for adjusting the hyperplane.

2.2.1 Perceptron The first iterative algorithm of linear classification is proposed by Frank Rosenblatt in 1956. The algorithm is shown in the following table 2.1. ______________________________________________________________________________________________________________________________________________________________________


- 25 -


Min Shi


July 2006

Table 2.1 The primal form of perceptron algorithm Given a linearly separable training set S and learning rate η ∈ R + l is the number of training samples w o ← 0 ; bo = 0 ; k ← 0 R ← max 1≤i ≤l x i repeat for i = 1 to l if y i ( w k ⋅ x i +b k ) ≤ 0 then w k +1 ← w k + ηy i x i bk +1 ← bk + ηy i R 2 k ← k +1 end if end for until no mistakes are made with in the for loop return (w k , bk ) where k is the number of mistakes This algorithm starts with an initial weight vector w 0 and the weight vector and bias are updated when a training point is misclassified by the current weights. This procedure will converge if a hyperplane exists to correctly classify the training data. In this case the data are linearly separable, otherwise non-linearly separable.

2.2.2 Dual Representation Dual representation is an important and a particularly useful form in machine learning. Assuming that the initial weight vector is the zero vector, the final hypothesis will be a linear combination of the training points: l

w = ∑ α i yi x i

(2.2)

i =1

where the coefficient of x i is given by the classification y i , the α i are positive values proportional to the number of times misclassification of x i has caused the weight to be updated.

______________________________________________________________________________________________________________________________________________________________________


- 26 -


Min Shi


July 2006

The decision function can be rewritten as follows: l

f (x) = w ⋅ x + b = ∑ α j y j x j ⋅x + b

(2.3)

j =1

Table 2.2 shows a dual form of the perceptron algorithm. Table 2.2 The dual form of perceptron algorithm Given a linearly separable training set S α ← 0; b = 0 R ← max 1≤i ≤l x i repeat for i = 1 to l if y i (∑ j =1α j y j x j ⋅ x i + b) ≤ 0 then l

αi ← αi +1 b ← b + yi R 2 end if end for until no mistakes are made with in the for loop return (α , b) An important property of the dual representation is that the data only appear through entries in the Gram matrix and never through their individual attributes[24].

2.3 Non-linear classification The linear learning solves problems by linear functions, however a simple linear function defined by the given attributes cannot achieve a target task flexibly. There are two main limitations of linear learning: First, the function that are tried to learn may not have a simple representation and may not be easily verified in this way. Second, normally the training data are noisy and so there is no guarantee that there is an underlying function that correctly classify the training data. Therefore, complex real world problems require more expressive hypothesis spaces than linear functions. This section discusses a method that constructs a non-linear ______________________________________________________________________________________________________________________________________________________________________


- 27 -


Min Shi


July 2006

machine to classify the data more flexibly.

2.3.1 Learning in Feature Space The complexity of the target function to be learned depends on the way it is represented, and the difficulty of the learning task can vary accordingly[25]. Kernel representations offer a solution by constructing a mapping from the input space to a high dimensional feature space to increase the power of linear learning for complex applications. Figure 2.6 shows an example of a mapping from a two-dimensional input space to a two-dimensional feature space. In the input space the data cannot be separated by a linear function, however a feature mapping simplify the classification task since the data in the feature space is linearly separable.

Figure 2.6 A mapping from a two-dimensional input space to a two-dimensional feature space The quantities introduced to describe the data are usually called features, while the original quantities are sometimes called attributes. The task of choosing the most suitable representation is known as feature selection. The space X is referred to as the input space, while F = {φ (x) : x ∈ X } is called the feature space[25]. Through selecting an appropriate kernel function, a non-linear mapping is performed between input space and high dimensional feature space. That means each input vector in the input space matches a feature vector in the feature space. However, this mapping does not increase the number of the tunable parameters. This technique also overcomes the curse of dimensionality in both computation and generalization. ______________________________________________________________________________________________________________________________________________________________________


- 28 -


Min Shi


July 2006

2.3.1 Implicit Mapping to Feature Space The implicit mapping from input space into feature space makes the data to be expressed by a new representation, and a decision function that is nonlinear in the input space but is equivalent to be linear in the feature space is constructed, in such a way the linear learning machine can be used. The function (2.1) that was represented in linear learning section is modified as: n

f (x) = w ⋅ φ (x) + b = ∑ wiφi (x) + b (2.4) i =1

where φ (x) is the mapping function. Therefore, there is two steps to construct a non-linear machine: first, a fixed non-linear mapping transforms the data from input space into a feature space; second, in the feature space a linear machine is used to classify them. In the dual representation, the decision function is as follows: l

f (x) = ∑ α j y j φ (x i ) ⋅ φ (x) + b

(2.5)

i =1

and the decision rule can be evaluated by the inner products between the training points and test points. In such a way, the dimension of feature space will not affect the computation.

2.4 Kernel Kernel functions provide methods that compute the inner product φ (x i ) ⋅ φ (x) in feature space directly using the original input points. Definition 1. A kernel is a function K , such that for all x, z '∈ X K ( x, z ' ) = φ ( x ) ⋅ φ ( z ' ) ,

where φ is a mapping from X to an (inner product) feature space F [25]. A kernel constructs an implicit mapping from the input space into a feature space and a linear machine is trained in the feature space. Gram matrix called kernel matrix describes the information of training data in the feature space. The key of this ______________________________________________________________________________________________________________________________________________________________________


- 29 -


Min Shi


July 2006

approach is to find a kernel function to be evaluated efficiently. The decision rule can be evaluated at most l times by the following function: l

f (x) = ∑ α j y j K (x i , x) + b

(2.6)

i =1

2.4.1 Kernel Matrix The training data are entered the algorithm through the entries of the Gram matrix that is also called kernel matrix. The information of a kernel matrix expresses kernel methods learning, each entry represents a measure of similarity between two objects. Equation (2.7) shows the form of Gram matrix. K ( x1 , x1 )

L

M

O

K = K ( x i , x1 ) M K (x n , x1 )

K ( x1 , x j )

L

M K (x i , x j )

L

K ( x1 , x n )

K (x n , x j )

O L

K (x i , x n ) (2.7) M K (x n , x n )

The Gram matrix is the central structure of kernel methods; it contains all the necessary information for the learning algorithm. Even if the number of features is infinite, Gram matrix might still be small and hence the optimization of the problem is solvable.

2.4.2 Properties of Kernels Kernel functions are used for avoiding the feature space in the computation of inner products. This section discusses the properties of kernels to define a kernel function for an input space.

Mercer’s Theorem Mercer’s theorem provides the properties to determine if a function K (x, z ) is a kernel. Obviously, a kernel must be symmetric, K (x, z ) = φ (x) ⋅ φ (z ) = φ (z ) ⋅ φ (x) = K (z, x)

______________________________________________________________________________________________________________________________________________________________________


- 30 -


Min Shi


July 2006

Proposition[25]. Let X be a finite input space with K (x, z ) a symmetric function on X . Then K (x, z ) is a kernel function if and if the matrix K = ( K (x i , x j )) in, j =1

is positive semi-definite (has non-negative eigenvalues). Theorem[25]. Let X be a compact substet of R n . Suppose K is a continuous symmetric function such that the integral operator TK : L2 ( X ) → L2 ( X ) , (TK f )(⋅) = ∫ K (⋅, x) f (x)dx , X

is positive, that is

∫

X ×X

K (x, z ) f (x) f (z )dxdz ≥ 0 ,

for all f ∈ L2 ( X ) . Then K (x, z ) can be expanded in a uniformly convergent series (on X × X ) in terms of TK ’s eigen-functions φ j ∈ L2 ( X ) , normalized in such a way that φ j

L2

= 1 , and positive associated eigenvalues λ j ≥ 0 , ∞

K (x, z ) = ∑ λ j φ j (x)φ j (z ) . j =1

2.4.3 Examples of Kernel So far many kernels have been proposed by researchers. This paper will introduce some kernels including four basic kernels that are used frequently.

Linear Linear kernel is the simplest linear model, K ( x, z ) = x, z .

______________________________________________________________________________________________________________________________________________________________________


- 31 -


Min Shi


July 2006

Polynomial Polynomial mapping is a popular method for non-linear modeling, K (x, z ) = x, z

d

.

For avoiding problems with the hessian becoming zero, a more preferable expression is, K (x, z ) = (γ x, z + r ) . d

Where γ , r, d are kernel parameters and γ > 0 .

Gaussian Radial Basis Function Radial basis function is one of kernels that are paid significant attention; the form of the Gaussian radial basis function (GRBF) is, K (x, z ) = exp(−

x−z 2σ 2

2

).

There are three reasons that Gaussian radial basis function is normally a reasonable choice in most applications: First, GRBF kernel non-linearly maps data into a higher dimensional feature space, so it can handle the case when the relation between target value and attributes is nonlinear. Second, the results of a model depend on the value of the kernel parameters, in other words the number of kernel parameters influences the complexity of model. So it’s better to select a kernel with the number of kernel parameters as small as possible, obviously the polynomial kernel has more kernel parameters than the GRBF kernel. Third, the RBF kernel has less numerical difficulties. Comparing the GRBF kernel with the polynomial kernel, the value of K (x, z ) in GRBF kernel is in the interval of [0,1] , while the value of K (x, z ) in polynomial kernel is in the interval [0, ∞ ) . ______________________________________________________________________________________________________________________________________________________________________


- 32 -


Min Shi


July 2006

Exponential Radial Basis Function The form of the Exponential radial basis function (ERBF) is, K (x, z ) = exp(−

x−z 2σ 2

).

It produces a piecewise linear solution which can be attractive when discontinuities are acceptable.

Sigmoid Sigmoid function is so far the most common form of activation function used in the artificial neural networks. It is a strictly increasing function that exhibits a graceful balance between linear and nonlinear behavior. A SVM model using a sigmoid kernel function is equivalent to a two-layer, feed forward neural network, K (x, z ) = tanh(γ x, z + r ) .

Fourier Series A Fourier series[26] can be considered an expansion in the following 2n + 1 dimensional feature space. The kernel is defined in the interval [− K ( x, z ) =

π π

, ], 2 2

sin( n + 12 )(x − z ) . sin( 12 (x − z ))

However, this kernel is probably not a good choice because its regularization capability is poor, which is evident by consideration of its Fourier transform.

Kernels from Kernels More complicated kernels can be made from kernels. Additive kernels are constructed by summing kernels, since the sum of two positive definite functions is positive definite,

K (x, z ) = ∑ K i (x, z ) . i

Similarly, multiplicative kernels are obtained by forming tensor products of kernels, ______________________________________________________________________________________________________________________________________________________________________


- 33 -


Min Shi


July 2006

K (x, z ) = ∏ K i (x, z ) . i

Multiplicative kernels are particularly useful for constructing multidimensional spline kernels. Furthermore, there are some other kernels,

K (x, z ) = aK (x, z ) , K (x, z ) = exp( K (x, z )) .

2.4.4 Kernel Selection Some kernels are introduced in section 2.4.3, however they are not all the kernels. The question that arises is how to chose the best one for solving a particular problem. [26] mentioned that the upper bound on the VC dimension is a potential avenue to provide a means of comparing the kernels. However, it requires the estimation of the radius of the hypersphere enclosing the data in the non-linear feature space. So far methods such as bootstrapping and cross-validation might be the most reliable ways for kernel selection. Cross-validation performs a partition of a data sample into several subsample and analyzes each single subsample based on other subsample. At first, the training data is divided into several folds, and then a fold is considered as the validation set and the rest are training set, the accuracy on predicting the validation set is recorded. A new fold that was not a validation set before is exchanged with the validation set, then the accuracy of the new validation set is record. The process is repeated until all the folds have been analyzed. The average of accuracy on all the validation sets is the cross validation accuracy. Besides of the kernel selection, the accuracy of a SVM model is also dependent on the selection of the model parameters. Grid search and pattern search are two methods for finding optimal parameter. Grid search performs a global search by geometric partition. In each search region a set of parameters will be tried, a region near the global optimum point will be found after finishing the grid search. Pattern search starts at the center of the search space and makes trial steps in each direction for each parameter. If the model is improved, the search center will be moved to the new point and the process will be repeated. If no improvement is found, the search size of each direction will be reduced and the search is tried again. The process is stopped when the search size of each direction is reduced to a specified tolerance. The pattern search finds a local optimal point of the search space. Obviously, if both a grid search and a pattern search are used, the global optimal point could be found. In this case the grid search is performed first. After finishing the grid search a point that the closest the ______________________________________________________________________________________________________________________________________________________________________


- 34 -


Min Shi


July 2006

global optimum point is found. The patter search starts a compass search from the point found by the grid search. However, for a model with more than two parameters, a grid search is time-consuming because the model must be evaluated at each grid region. For example, if there are 10 search intervals are evaluated for each parameter, then a model with two parameters have to be evaluated at 10 * 10 = 100 points, a model with three parameters have to be evaluated at 10 * 10 * 10 = 1000 and so on. That means the time complexity is increased exponentially with the increase of parameters. If cross-validation is used for evaluating each model, the calculation time will be further multiplied by the number of cross-validation folds. Obviously, this approach may be inefficient, even infeasible for models with large size. Some techniques have been proposed for model selection. Minimizing bounds of leave-one-out (loo) errors [27] is an important and efficient approach for support vector machine model selection.

2.5 Generalization The introduction of kernels greatly increases the expressive power of the learning machines while retaining the underlying linearity that will ensure that learning remains tractable. The increased flexibility, however, increase the risk of overfitting as the choice of separating hyperplane becomes increasingly ill-posed due to the number of degrees of freedom[28]. The basic training principle of SVM is to analyze and find an optimal hyperplane such that the expected classification error for unseen test samples is minimized, i.e. good generalization. To control the generalization ability of a learning machine, two different factors have to be controlled: the error-rate on the training data and the capacity of the learning machine as measured by its VC-dimension. The two factors form a trade-off: the smaller the VC-dimension of the set of functions of the learning machine, the smaller the confidence interval, but the larger the value of the error frequency. A general way for resolving this trade-off was proposed as the principle of structural risk minimization: for the given data set one has to find a solution that minimizes their sum.

2.5.1 VC Dimension The VC dimension is a measure of the capacity of a given set of functions. It is defined as: each function of the class separates the pattern in a certain way and thus ______________________________________________________________________________________________________________________________________________________________________


- 35 -


Min Shi


July 2006

induces a certain labeling of the patterns. Since the labels are in {± 1} , there is at most 2 m different labeling for m patterns (Schölkopf and Smola, 2002). Figure 2.7(a) illustrates how three points in the plane can be shattered by the linear indicator function whereas four points cannot. However the function sin(ax ) shown in figure 2.7(b) has an infinite VC dimension. The linear indicator function in n dimensional space has a VC dimension equal to n + 1 .

Figure 2.7 Two examples of VC dimension

2.5.2 ERM VS SRM For learning machines most training algorithms implement Empirical Risk Minimization (ERM). These algorithms implement a classification task by tuning its adjustable parameters λ and learning the mapping x → y . This will result in a possible mapping x → f ( x, λ ) . The goal of these algorithms is to minimize the empirical risk,

Remp ( f ) =

1 l ∑ L( f (x i ), yi ) , l i =1

which is known as the principle of ERM, where L is a loss function. It can easily result in over fitting for a particular problem. ______________________________________________________________________________________________________________________________________________________________________


- 36 -


Min Shi


July 2006

Structural Risk Minimization (SRM) is a general complexity regularization method that automatically selects the model complexity that approximately minimizes the misclassification error probability of the empirical risk minimiser[29]. In contrast with ERM, the goal of SRM is to find a good trade off between low empirical risk and the low capacity of machines. SRM considers the problem defined as following,

R ( f ) ≤ Remp ( f ) +

h ln(

2l η + 1) − ln( ) h 4 . l

The second term of this formula is VC confidence, where 0 ≤ η ≤ 1 , the bound holds with probability 1 − η , h is the VC dimension and is a measure of capacity of a learning machine.

Figure 2.8 The bound on actual risk of a classifier Figure 2.8 shows the difference between ERM and SRM on the actual risk. It is clear that a learning machine with a large VC dimension will give low empirical risk but the VC confidence is also large. According to the statistical learning theory, an actual risk of a learning machine is equals to empirical risk plus VC confidence. The principle of SRM is to minimize both empirical risk and VC confidence. That means a learning machine with SRM will give the lowest actual risk, i.e. better generalization.

2.6 Maximum Margin Classifier The so-called maximal margin classifier is to find a hyperplane that separates the data as far as possible, an example has been shown in Figure 2.2. It is the simplest classifier that can only be used for the data that are linearly separable in the feature ______________________________________________________________________________________________________________________________________________________________________


- 37 -


Min Shi


July 2006

space. Without loss of generality considering a two-class classifier, the decision function of the classifier is, f ( y ) = sgn( w ⋅ x + b) .

If the data are linear separable, then a pair of w and b that satisfy the constraint can be found, y i ( w ⋅ x i + b) ≥ 1 ∀i , (2.8)

where the functional margin of canonical hyperplanes is defined as 1. If the functional margin is replaced with a geometric margin, the geometric margin for the linear classifier is

1 . So to realize the maximal margin hyperplane with geometric margin w

1 , the optimization problem is, w Minimize w ,b Subject to

w⋅w ,

y i ( w ⋅ x i + b) ≥ 1 ∀i

Solving the optimization problem by using Lagrangian formulation is considered as, L(w, b, α ) =

l 1 w ⋅ w − ∑ α i [ y i ( w ⋅ x i + b) − 1] 2 i =1

where α i ≥ 0 are the Lagrange multipiers. Applying the dual formulation, L is maximized to subject to the constrains, l dL(w, b, α ) = w − ∑ y iα i x i = 0 , dw i =1 l d (w, b, α ) = ∑ y iα i = 0 . db i =1

The new Lagrangian is obtained by substituted above equations, l

L ( w , b, α ) = ∑ α i − i =1

1 l ∑ y i y j α iα j x i ⋅ x j . 2 i , j =1

Solving the new Lagrangian formulation will give the value for all α . w is got ______________________________________________________________________________________________________________________________________________________________________


- 38 -


Min Shi

from


July 2006

dL(w, b, α ) = 0 , and b is found by following equations, dw if α i = 0, then y i (w ⋅ x i + b) ≥ 1 , if α i > 0, then y i (w ⋅x i +b) = 1 .

So the optimization problem, l

Maximize

1 l ∑ y i y j α iα j x i ⋅ x j , 2 i , j =1

∑α i − i =1

l

Subject to

∑yα i =1

i

i

= 0 αi ≥ 0 ,

is to realize the maximal margin hyperplane with geometric margin

1 . w

2.7 Soft Margin Classifier A maximal margin classifier can only be used for the data that are linear separable in the feature space. For satisfying the real-world problems where the data are not linearly separable in the feature space, a soft margin classifier that tolerate noise and outliers is extended from a maximal margin classifier. Revising the constrains equation (2.8) by adding a slack variable ξ , the new constrains are, y i ( w ⋅ x i + b) + ξ i ≥ 1 ∀i , (2.9).

ξi ≥ 0

∀i

In that case some data are allowed to be misclassified, figure 2.9 shows an example of a soft margin classifier.

______________________________________________________________________________________________________________________________________________________________________


- 39 -


Min Shi


July 2006

Figure 2.9 An example of soft margin classifier The optimization problem for soft margin classifier is, l

w ⋅ w + C ∑ ξ ik

Minimizeξ ,w.b

i =1

where C and k define the cost of constraint violation. For positive integers k , the above optimization problem is a convex programming problem. If k = 1 or k = 2 , it will be a quadratic programming (QP) problem. C is the upper bound of α , it is a trade-off between maximum margin and classification error. A higher value of C gives a larger penalty for classification error, and that means α is allowed to have a large value. A particular data maybe classified correctly by increasing the α value. Applying the dual formulation, the 1-norm soft margin optimization problem is l

Maximize

∑α i − i =1

1 l ∑ y i y j α iα j K (x i , x j ) , 2 i , j =1

l

Subject to

∑yα i =1

i

i

= 0 C ≥ αi ≥ 0 .

2.8 Multi-class Classifier So far the classification problems that were discussed are binary classification problems. Some methods have been proposed for solving multi-class problems; here one of them that is most used will be introduced briefly. Among the existing multi-class approaches, “one against one” (Knerr et al., 1990) is one of the most suitable methods for practical problem. It constructs k (k − 1) / 2 ______________________________________________________________________________________________________________________________________________________________________


- 40 -


Min Shi


July 2006

classifiers where k is the number of categories, each one trains data from two different classes, mth and nth classes, solving the following binary classification problem, 1 mn w ⋅ w mn + C ∑ ξ imn 2 i =1

Minimizeξ ,w .b Subject to

y i ( w mn ⋅ x i + b mn ) + ξ imn ≥ 1, if x i in the mth class,

Subject to

y i ( w mn ⋅ x i + b mn ) − ξ imn ≤ −1, if x i in the kth class,

ξ imn ≥ 0 . A voting strategy will be used in this approach: each binary classification is considered to be a voting where votes can be cast for all data points x . The point x is assigned into a class with maximum number of votes. If there are two classes have the same votes, a simple strategy is to select one with the smallest index.

2.9 Type of SVM 2.9.1 C-SVM Classification Given training vector x i ∈ R n , C-SVC (Cortes and Vapnik, 1995; Vapnik, 1998) solves the following primal problem for binary classification y i ∈ {−1,1} : l 1 w ⋅ w + C∑ξi 2 i =1

Minimizeξ ,w.b Subject to

y i (w ⋅ φ (x i ) + b) + ξ i ≥ 1 ξ i ≥ 0, i = 1,K, l .

Its dual formulation is l

Maximize

∑α i − i =1

1 l ∑ y i y j α iα j K (x i , x j ) , 2 i , j =1

l

Subject to

∑yα i =1

i

i

= 0 0 ≤ αi ≤ C .

C is the upper bound of α i . The decision function is

______________________________________________________________________________________________________________________________________________________________________


- 41 -


Min Shi


July 2006

l

sgn(∑ y iα i K (x i , x) + b) i =1

2.9.2 ν -SVM Classification The ν -support vector classification (Schölkopf et al., 2000) introduces a new parameter ν that is to control the number of margin errors and support vectors. The primal problem for this approach to solve binary classification is 1 1 l w ⋅ w − νρ + ∑ ξ i l i =1 2

Minimizeξ ,w.b Subject to

y i (w ⋅ φ (x i ) + b) + ξ i ≥ ρ ξ i ≥ 0, i = 1, K, l, ρ ≥ 0 .

Its dual formulation is Maximize

1 l ∑ y i y j α iα j K (x i , x j ) , 2 i , j =1 l

Subject to

∑yα i =1

i

l

∑α i =1

i

i

= 0 0 ≤ αi ≤

1 l

≥ν

______________________________________________________________________________________________________________________________________________________________________


- 42 -


Min Shi


July 2006

Chapter Three Methodology

______________________________________________________________________________________________________________________________________________________________________


- 43 -


Min Shi


July 2006

3.1 Introduction After introducing SVM theory, this chapter focuses on the implementation of road signs recognition using SVM. To construct the SVM classification model, LIBSVM, a library for support vector classification and regression, is cited in this project. Some parts of the source code are modified, so that the library has the feature of object-oriented structure and can be assembled and extended more easily.

Figure 3.1 A block diagram of road signs classification using SVM The road sign images that used for recognition are binary image with 36 × 36 pixels. Of course, the image size, 36 × 36 pixels, is not only available. Actually, other size of the binary image, for example 28 × 28 pixels, is also practicable. However, all the data samples of road sign must use the same image size in one experiment. Two kinds of feature values of these images, binary image representation and Zernike moments, are presented using special representation to the SVM for training and test. Data normalization is an important process that scales and evenly distributes the data into an acceptable range to reduce the complexity of computation. Figure 3.1 shows a block diagram of road signs classification using SVM. The results of two kinds of feature values are analyzed and compared in chapter four. Besides of recognizing road signs with two kinds of feature values, the task of this project is to respectively implement two group of road signs classification. One group is road sign shape with seven categories and the other group is speed limit sign with five categories. The categories that each group contains are shown in table 3.1 and table 3.2.

______________________________________________________________________________________________________________________________________________________________________


- 44 -


Min Shi


July 2006

Table 3.1 Road sign shapes group Road Signs Binary Image

No Entry

Stop

Circle

Downward Upward Triangle Triangle

Circle With bar

No Standing

Table 3.2 Speed limit signs group Road Signs Binary Image

Speed limit 30 km./h





3.2 LIBSVM LIBSVM is a library for support vector classification and regression, which is developed by Chih-Chung Chang and Chih-Jen Lin [30]. It uses “one against one” method to solve multi-class classification problems. LIBSVM consists of three modules: Scaling. Scaling implements the process of data normalization. It scales and evenly distributes the data into an acceptable range. The main advantage is to avoid attributes in greater numeric ranges dominate those in smaller numeric ranges. Another advantage is to avoid numerical difficulties during the calculation[30]. Train. LIBSVM supports four basic kernels, linear, polynomial, radial basis function and sigmoid function, and two types of SVM classification, C-SVM and ν -SVM. The SVM model is trained with the scaling data, after that a model file will be created. Predict. This module calculates the results of test data combining with the model file that is created during the training process. The prescient results are saved in a predict file.

3.3 Data Representation The following notation is used for data representation to SVM: ______________________________________________________________________________________________________________________________________________________________________


- 45 -


Min Shi


y d1 : x1

d 2 : x2

July 2006

d 3 : x3 K d n : x n

where, y is the desired output of the data sample, which identifies a class, for road sign shapes classification y = 0 to 6 since there are seven categories, while for speed limit signs classification y = 0 to 4 since there are five categories, xi is the ith attribute of a input vector x and i = 0 to n , d i is the index of attribute xi starting from 1 in an ascending order. In training data set the desired output y is used to supervise the SVM learning while in test data set y is used to verify the output of SVM. If the desired output of test data set is unknown, y can be any number. In that case, the test result cannot be verified whether the actual output is correct or not.

3.4 Data Normalization Before the data are presented into SVM for training and test, normalization is an important process that scales and evenly distributes the data into an acceptable range. Kernel values normally depend on the inner products of feature vectors, so large attribute values might cause numerical problems. As same as Neural Networks, the goal of data normalization is to reduce the number of inputs and helps the machines learn more easily. However, the process of data normalization is not necessary for every problem. To determine if the data normalization is necessary for a particular problem is dependent on the input features and the restricted domain of kernels. Doing the process of data normalization for every problem is recommended, since the process is very simple and it will never make the problems more complicated. Normally, data normalization uses a simple linear scaling method. The formula that transforms each original attribute D to an input value I is: I = I min + ( I max − I min ) * ( D − Dmin ) /( Dmax − Dmin ) where Dmin and Dmax are the minimum and maximum value of the original attribute. I min and I max are the minimum and maximum values of the acceptable ______________________________________________________________________________________________________________________________________________________________________


- 46 -


Min Shi


July 2006

range, in general the range is [0,1] or [−1,1] . For the normalization is entirely reversible with little or no loss in accuracy, the value of I must be kept enough precision.

3.5 Binary Representation Binary representation is the most straightforward and simplest method to present a binary image; 0 denotes black pixels while 1 denotes white pixels. In this project since each binary image is saved in 36 × 36 pixels, so totally there are 1296 attributes for one input vector. Figure 3.2 shows an example of no entry road signs.

Figure 3.2 An example of binary representation of road sign

3.6 Zernike Moments Moments have been widely used in computer vision applications especially in pattern recognition. A set of features that are extracted from an image can be used to classify the pattern. A collection of moments can be computed to capture the global features of an image and used as a feature vector for classification the pattern. However the number of moments that can be computed is infinite, so it is important to efficiently compute a finite subset of the moments that for discriminating between patterns. ______________________________________________________________________________________________________________________________________________________________________


- 47 -


Min Shi


July 2006

Depending on the specific application and type of patterns, different moments may be more useful then others. In this project Zernike moments are selected to extract the features of road signs. Teague proposed Zernike moments based on the basis set of orthogonal Zernike polynomials[31]. Zernike moments have been proven to be superior to other moments since they have many important properties, such as rotation invariance, less sensitive to noise, near-zero information of redundancy, expression efficiency, fast computation and multi-level representation, even image reconstruction by using Zernike moments.

3.6.1 Definition of Zernike Moments The kernel of Zernike moments[32] is a set of orthogonal Zernike polynomials defined over the polar coordinate space inside a unit circle. The two-dimensional Zernike moments of order p with repetition q of an image intensity function f ( x, y ) are defined as, Z pq =

p +1

π

∫∫

x 2 + y 2 ≤1

f ( x, y )V pq* ( x, y )dxdy

(3.1).

For a discrete image, if f ( x, y ) is the current pixel then equation 3.1 can be written as, Z pq =

p +1

π

∑∑ f ( x, y)V x

* pq

( x, y ) , where x 2 + y 2 ≤ 1 (3.2).

y

The Zernike polynomials of order p with repetition q , V pq ( x, y ) are defined as, V pq ( x, y ) = R pq (rxy )e

− jqθ xy

(3.3)

where rxy = x 2 + y 2 , θ xy = tan −1 ( y / x) , and the real-valued radial polynomial, R pq (r ) , is given as, ( p− q / 2)

R pq (r ) =

∑ k =0

(−1) k

( p − k )! r p−2k k!(( p + q ) / 2 − k )!(( p − q ) / 2 − k )!

(3.4),

where 0 ≤ q ≤ p and p − q is even. ______________________________________________________________________________________________________________________________________________________________________


- 48 -


Min Shi


July 2006

3.6.2 Image Normalization To achieve the pattern recognition that is irrespective of image size, position and orientation, the extracted feature of an image should be translation, scale and rotation invariants. Zernike moments have properties of rotation invariance, however to carry out the translation and scale invariants, the image need to be normalized before calculating Zernike moments. In [33], the pre-processing of road sign image is done by three steps: Binarizing. The chosen RGB road sign image is converted into a binary image. Coordinate conversion. Coordinate conversion achieves translation invariants by moving the origin to the centroid of the object. The pixel f ( x, y ) of original image is transformed into

f ( x + xcen , y + y cen ) of another image, where

( x cen , y cen ) are object centroid. Then the pixel coordinates of the new image is obtained as x' = y − y cen and y ' = x − xcen . Mapping coordinates onto circle. At first using Euclidean distance, the distance d max of the farthest object pixel from the centroid ( x cen , y cen ) is calculated by d max = ( x cen − x' ) 2 + ( y cen − y ' ) 2 . Then a circle is drawn around the object with

radius d max . Regarding the circle as a unit circle (i.e. d max = 1 ), the coordinates of each object’s pixel are mapped in the unit circle by calculating x' ' = x' / d max and y ' ' = y ' / d max . So far translation invariants have been achieved in the second step of pre-processing. While scale invariants are carried out as Z ' pq = Z pq /( M × N ) . In our case M × N = 2(d max * d max ) .

______________________________________________________________________________________________________________________________________________________________________


- 49 -


Min Shi


July 2006

3.6.3 Moments Representation The value p is the order of Zernike moments. Different order represent different feature of an image. The low order moments represent the global shape of a pattern and the higher order moments represent the more detail information of a pattern. Baddam [33] chose the ( p, q ) of Zernike moments from (5,1) to (12,12), in that case 40 moments will be calculated (table 3.3). Each Zernike moment is an attribute of the image, so totally there are 40 attributes for one input vector.

Table 3.3 The Zernike moments representation of road sign Z(5,1) Z(6,0) Z(7,1) Z(8,0) … Z(12, 0)

Z(5,3) Z(6,2) Z(7,3) Z(8,2) … Z(12, 2)

Z(5,5) Z(6,4) Z(7,5) Z(8,4) … Z(12, 4)

Z(6,6) Z(7,7) Z(8,6) … Z(12, 6)

Z(8,8) … Z(12, 8)

… Z(12, 10)

… Z(12, 12)

______________________________________________________________________________________________________________________________________________________________________


- 50 -


Min Shi


July 2006

Chapter Four Experiments and Analysis

______________________________________________________________________________________________________________________________________________________________________


- 51 -


Min Shi


July 2006

4.1 Introduction As it has been mentioned earlier, this project is to implement a road sign recognition model using SVM. The methodology for constructing the model has been introduced in chapter three. Two different experiments were performed to test the effects of classification with different features. One was to training and test the recognition model with binary images (Binary Representation), and the other was to training and test the recognition model with features such as Zernike moments (Zernike Moments Representation) . Furthermore, four different experiments are performed to test and analyze the effects of classification with different SVM kernels, linear, polynomial, RBF and sigmoid and two kinds of SVM types, C-SVM and ν -SVM. In the end the results of two recognition models, Fuzzy ARTMAP and SVM, are compared from various aspects. A database of road sign images for training and test the SVM recognition model has been acquired. The data samples contain 250 binary images from five categories of speed limit signs (figure 4.1) and 350 binary images from seven categories of road sign shapes (figure 4.2). Each road sign category has 50 data samples that were divided randomly into two sets, 30 for training and 20 for test, in every experiment. The recognition model was trained on the training set and then tested on the test set. The following subsections describe the experimental set up more detail.

______________________________________________________________________________________________________________________________________________________________________


- 52 -


Min Shi


July 2006

(a) Class 0

(b) Class 1

(c) Class 2

(d) Class 3

(e) Class 4 Figure 4.1 Some binary images and their corresponding categories of speed limit signs for recognition

______________________________________________________________________________________________________________________________________________________________________


- 53 -


Min Shi


July 2006

(a) Class 0

(b) Class 1

(c) Class 2

(d) Class 3

(e) Class 4

(f) Class 5

(g) Class 6 Figure 4.2 Some binary images and their corresponding categories of road sign shapes for recognition

______________________________________________________________________________________________________________________________________________________________________


- 54 -


Min Shi


July 2006

4.2 Classification with Different Features 4.2.1 Binary Representation The data represented by binary representation contains seven categories for road sign shapes and five categories for speed limit signs. Each category and its desired output are shown in table 4.1 and table 4.2.

Table 4.1 Desired outputs of road sign shapes with binary representation Binary Image Desired Output

0

1

2

3

4

5

6

Table 4.2 Desired outputs of speed limit signs with binary representation Binary Image Desired Output

0

1

2

3

4

To train the SVM recognition model linear kernel and C-SVM type were chosen with following parameters C = 1. For evaluating SVM recognition model from various sides, not only the best result but also the worst result of classification were observed.

Classification of Road Sign Shapes Ten pairs of training/test data sets were created. Each pair of data set was selected randomly without repetition from the database of road sign shape images. For each road sign shape category there were 30 training instances and 20 test instances. In this way every pair of data set contained 210 instances on training set and 140 instances on test set.

______________________________________________________________________________________________________________________________________________________________________


- 55 -


Min Shi


July 2006

Table 4.3 The correct classification rate of road sign shapes on the ten pairs of training/test data set with binary representation Correct Classification Rate Training Set (210) Test Set (140) 100% 100%

Pairs of Data Sets 10

All these instances were presented to SVM with binary representation. The SVM model was trained and tested ten times and each time a different pair of training/test data set were used. All the experimental results that were performed on the ten pairs of data sets were identical and extremely high, achieving 100% accuracy (table 4.3). Table 4.4 and table 4.5 show the confusion matrix on training and test set for classification of road sign shapes. None of these instances was classified incorrectly for both training and test classification.

Table 4.4 Confusion matrix of road sign shapes classification with binary representation on training set Desired Output 0 1 2 3 4 5 6

0 30

1

Classified As 2 3 4

5

6

30 30 30 30 30 30

Error Classified 0 0 0 0 0 0 0

______________________________________________________________________________________________________________________________________________________________________


- 56 -


Min Shi


July 2006

Table 4.5 Confusion matrix of road sign shapes classification with binary representation on test set Desired Output 0 1 2 3 4 5 6

0 20

1

Classified As 2 3 4

5

6

20 20 20 20 20 20

Error Classified 0 0 0 0 0 0 0

Classification of Speed Limit Sign In the same way ten pairs of training/test data sets from the database of speed limit sign images were created. Each pair of data set contained 150 instances on training set and 100 instances on test set. The SVM was trained and tested with these ten pairs of data sets. Table 4.6 shows the experimental results that were performed. The number in the brackets is the number of instances that are classified correctly.

Table 4.6 The correct classification rate of speed limit signs on the ten pairs of training/test data set with binary representation Pairs of Data Sets 1 4 5

Correct Classification Rate Training Set (150) Test Set (100) 100% 100% 100% 99% (99) 100% 98% (98)

Only one pair of training/test data sets achieved very ideal results, 100% accuracy on both training and test classification. Four pairs of data sets achieved 100% accuracy on training set but 99% accuracy on test set. The worst pair of results that were performed was 100% accuracy on training set but 98% accuracy on the test set. The confusion matrixes of one worst result have been shown in table 4.7 and table 4.8.

______________________________________________________________________________________________________________________________________________________________________


- 57 -


Min Shi


July 2006

Table 4.7 Confusion matrixes of one worst pair of results for speed limit signs classification with binary representation on training set Desired Output 0 1 2 3 4

0 30

1

Classified As 2

3

Error Classified 0 0 0 0 0

4

30 30 30 30

Table 4.8 Confusion matrixes of one worst pair of results for speed limit signs classification with binary representation on test set Desired Output 0 1 2 3 4

0 19 1

1

Classified As 2 1

3


4

19 20 20 20

Results Analysis Normally those road sign images that were classified incorrectly have very poor image qualities. Table 4.9 shows two examples of the confused images that were misclassified most frequently. In above worst experiment, the first example was classified incorrectly as speed limit sign 70 and the second example was classified incorrectly as speed limit 30.

Table 4.9 Two examples of road sign images that were classified incorrectly with binary representation. Road Sign Images

Classified Incorrectly As Speed Limit Sign 70 Speed Limit Sign 30

______________________________________________________________________________________________________________________________________________________________________


- 58 -


Min Shi


July 2006

It seems that the number 30 is more similar to the number 50 than the number 70. In the first case, however, it was classified as the number 70. One of the reasons is that there are much pepper noises around the number 30 and some instances of speed limit signs 70 have the similar feature also (figure 4.3). In the other case, the number 50 is damaged badly, which causes trouble to classification. Despite the exists of the noise and damage, the SVM still achieved 100% accuracy for speed limit signs classification with binary representation on one pair of training/test data sets.

Figure 4.3 Some instances of speed limit signs 70 with the feature of pepper noise.

4.2.2 Zernike Moments Representation Zernike moments have the property of rotation invariance, which means it does not distinguish between upward triangles and downward triangles. So, there are only six categories for road sign shapes presented by Zernike moments representation. Table 4.10 and table 4.11 show every category and its desired output for road sign shapes classification and speed limit signs classification.

Table 4.10 Desired outputs of road sign shapes with Zernike moments representation. Binary Image Desired Output

0

1

2

3

4

5

Table 4.11 Desired outputs of speed limit signs with Zernike moments representation Binary Image Desired Output

0

1

2

3

4

The SVM recognition model is trained with Zernike moments representation using the same parameters that were performed with binary representation: ______________________________________________________________________________________________________________________________________________________________________


- 59 -


Min Shi


July 2006

Kernel: Linear, SVM type: C-SVM, Kernel Parameters: C = 1 . Similarly, both the best and the worst results of classification were observed.

Classification of Road Sign Shapes Ten pairs of training/test data set were selected randomly from the database of road sign shapes represented with Zernike moments. With the Zernike moments representation, every pair of training/test data set contained 180 training instances and 120 test instances. Table 4.12 shows the results that were performed on these ten pairs of data sets.

Table 4.12 The correct classification rate of road sign shapes on the ten pairs of training/test data set with Zernike moments representation Pairs of Data Sets 4 3 1 1 1

Correct Classification Rate Training Set (180) Test Set (120) 100% 100% 100% 99.17% (119) 100% 98.33% (118) 100% 97.5% (117) 100% 96.67% (116)

Four pairs of data sets achieved 100% accuracy on both training and test classification, while others achieved 100% accuracy on training set but a few incorrect classifications on test set. The confusion matrixes of the worst result are shown in table 4.13 and table 4.14.

______________________________________________________________________________________________________________________________________________________________________


- 60 -


Min Shi


July 2006

Table 4.13 Confusion matrixes of the worst pair of results for road sign shapes classification with Zernike moments representation on training set Desired Output 0 1 2 3 4 5

0 30

1

Classified As 2 3

4

5

30 30 30 30 30

Error Classified 0 0 0 0 0 0

Table 4.14 Confusion matrixes of the worst pair of results for road sign shapes classification with Zernike moments representation on test set. Desired Output 0 1 2 3 4 5

0 20

1

Classified As 2 3

4

5

17

1 1 20

20 20 19 2

Error Classified 0 0 0 1 3 0

Classification of Speed Limit signs Ten pairs of training/test data sets were selected randomly from the database of speed limit signs represented with Zernike moments; every pair of training/test data set contained 150 training instances and 100 test instances. Table 4.15 shows the results that were performed on these data sets.

______________________________________________________________________________________________________________________________________________________________________


- 61 -


Min Shi


July 2006

Table 4.15 The correct classification rate of speed limit signs on the ten pairs of training/test data set with Zernike moments representation Pairs of Data Sets 1 1 1 1 1 1 1 1 1 1

Correct Classification Rate Training Set (150) Test Set (100) 100% 82% (82) 100% 81% (81) 99.33% (149) 92% (92) 99.33% (149) 84% (84) 99.33% (149) 84% (84) 98.67% (148) 91% (91) 98.67% (148) 87% (87) 98.67% (148) 85% (85) 98.67% (148) 84% (84) 98.67% (148) 80% (80)

Unfortunately, none of these pairs of data set achieved 100% accuracy on both training and test classification. Actually, only two training sets reached 100% correct classification. The confusion matrix of the worst training result is shown in table 4.16, while the confusion matrix of the worst test result is shown in table 4.17.

Table 4.16 Confusion matrix of the worst training result for speed limit signs classification with Zernike moments representation Desired Output 0 1 2 3 4

0 30 1 1

1

Classified As 2

3

4

29 29 30 30


______________________________________________________________________________________________________________________________________________________________________


- 62 -


Min Shi


July 2006

Table 4.17 Confusion matrix of the worst test result for speed limit signs classification with Zernike moments representation Desired Output 0 1 2 3 4

0 16 2

2

1 1 16 1 2 3

Classified As 2 2 19 1

3

17 3


4 1 2

12

Results Analysis Comparing the experimental results of SVM recognition model using binary representation with the ones using Zernike moments for both road sign shapes classification (table 4.3 VS table 4.12) and speed limit signs classification (table 4.6 VS table 4.15), obviously the SVM recognition model with Zernike moments representation does not work as good as the SVM recognition model with binary representation. Table 4.18 shows some instances that are misclassified with Zernike moments representation.

Table 4.18 Some instances of road signs that were classified incorrectly with Zernike moments representation. Road Sign “Circle with bar” Speed Limit 30 Speed Limit 30 Speed Limit 50 Speed Limit 90

Classified Incorrectly As “NO STANDING” Speed Limit 50 Speed Limit 70 Speed Limit 110 Speed Limit 50

Zernike moments extract the features from a binary image, reducing the dimension of the attributes from 1296 to 40. So the choice of Zernike moments is a key factor for efficiently computing and discriminating between patterns. The low order moments represent the global shape of a pattern and the higher order the details. Padilla-Vivanco et al. [34] presented an example that reconstructed a binary image using different order of Zernike moments, as shown in figure 4.4. Evidently, the higher the order of Zernike moments the more accurate the reconstructed image. In ______________________________________________________________________________________________________________________________________________________________________


- 63 -


Min Shi


July 2006

other words, the Zernike moments with higher order save more detail features of the original image. However, higher order increases computational cost.

Figure 4.4 Reconstructed image with different order of Zernike Moments This project used the ( p, q ) of Zernike moments from (5,1) to (12,12). That means the lowest order is 5 and the highest order is 12. It is not difficult to imagine that with highest order 12, sometimes some road sign shapes are a little difficultly distinguished from others. For speed limit signs, as well, the similar shapes made them to be misclassified more frequently when the SVM model were trained and tested with Zernike moments.

4.3 Classification with Different Kernels and SVM Types The experiments that were performed in the section 4.2 chose the simplest linear kernel and C-SVM type to train the SVM recognition model. This section focuses on analyzing the effects using different kernels and SVM types. The SVM recognition model were trained respectively using four basic kernels, linear, polynomial, radial basis function (RBF) and sigmoid, and two kinds of SVM type, C-SVM and ν -SVM. Four group experiments were performed; each group experiment used the same pair of training/test data set. Other parameters of the SVM model were given below: C = 1 , ν = 0 .5 , γ = 1 / n , r = 0 , d = 3 .

where n is the number of attributes for an input vector. Table 4.19 and table 4.20 ______________________________________________________________________________________________________________________________________________________________________


- 64 -


Min Shi


July 2006

show the experimental results that were trained and tested the SVM with binary representation for road sign shapes classification and speed limit signs classification. The number in the parenthesis is the number of instances that are classified correctly.

Table 4.19 The correct classification rate of road sign shapes using different kernels and SVM types with binary representation SVM Type

C-SVM

ν -SVM

Kernel Linear Polynomial RBF Sigmoid Linear Polynomial RBF Sigmoid

Correct Classification Rate Training Set (210) Test Set (140) 100% 100% 100% 97.86% (137) 100% 100% 100% 99.29% (139) 100% 100% 100% 97.86% (137) 100% 100% 99.52% (209) 100%

Table 4.20 The correct classification rate of speed limit signs using different kernels and SVM types with binary representation SVM Type

C-SVM

ν -SVM


Correct Classification Rate Training Set (150) Test Set (100) 100% 98% (98) 100% 96% (96) 100% 97% (97) 99.33% (149) 97% (97) 100% 98% (98) 100% 96% (96) 100% 97% (97) 100% 98% (98)

Table 4.21 and table 4.22 show the experimental results that were trained and tested the SVM with Zernike moments representation for road sign shapes classification and speed limit signs classification.

______________________________________________________________________________________________________________________________________________________________________


- 65 -


Min Shi


July 2006

Table 4.21 The correct classification rate of road sign shapes using different kernels and SVM types with Zernike moments representation SVM Type

C-SVM

ν -SVM


Correct Classification Rate Training Set (180) Test Set (120) 100% 100% 87.22% (157) 85.83% (103) 98.89% (178) 99.17% (119) 96.67% (174) 99.17% (119) 98.33% (177) 99.17% (119) 98.33% (177) 97.5% (117) 98.33% (177) 99.17% (119) 98.33% (177) 99.17% (119)

Table 4.22 The correct classification rate of speed limit signs using different kernels and SVM types with Zernike moments representation SVM Type

C-SVM

ν -SVM


Correct Classification Rate Training Set (150) Test Set (100) 100% 82% (82) 70% (105) 56% (56) 89.33% (134) 72% (72) 74% (111) 68% (68) 93.33% (140) 78% (78) 93.33% (140) 85% (85) 94% (141) 79% (79) 93.33% (140) 76% (76)

In general, ν -SVM works better than C-SVM. Nevertheless, lots of experiments revealed that C-SVM had better performances for linear kernel than ν -SVM. Since linear kernel is the most suitable choice for the SVM recognition model, so C-SVM is the most suitable SVM type as well.

Results Analysis The experimental results show that linear kernel performs better than other kernels. One of the reasons is linear kernel normally has a good performance when the number of attributes is big. The leading cause, however, is that the accuracy of a SVM model depends on the selection of the model parameters. The formulas of four basic kernels ______________________________________________________________________________________________________________________________________________________________________


- 66 -


Min Shi


July 2006

are: Linear: K (x, z ) = x, z , Polynomial: K (x, z ) = (γ x, z + r ) , d

RBF: K (x, z ) = exp(−γ x − z ) , 2

Sigmoid: K (x, z ) = tanh(γ x, z + r ) . Where γ , d and r are kernel parameters. There are no any parameters in linear kernel, but three, one, two parameters respectively in polynomial, RBF and sigmoid kernels. In addition, C and ν are two important parameters of SVM model. In our experiments, the values of these parameters were given as C = 1 , ν = 0.5 , γ = 1 / n , r = 0 and d = 3 .

The parameter C defines the upper bound of α in C-SVM model, it is a trade-off between maximum margin and classification error. Figure 4.5 shows the performances of the SVM model using linear kernel and C-SVM type of different parameter C for speed limit recognition with Zernike moment representation. There is no kernel parameter in linear kernel, so the parameter C is the only variable in this model. A higher value of the parameter C allows α having a large value. The accuracy of classification could be improved by increasing the value of α . However, it is unuseful to define an excessive upper bound of α . training

test

Classification Accuracy

105% 100% 95% 90% 85% 80% 0.1

0.3

0.5

0.7

0.9

2

4

6

8

10

Parameter C

Figure 4.5 The performances of the SVM model using linear kernel and different parameter C for speed limit recognition with Zernike moment representation ______________________________________________________________________________________________________________________________________________________________________


- 67 -


Min Shi


July 2006

The ν -SVM uses another parameter ν to control the number of margin error and support vectors. Figure 4.6 and figure 4.7 shows the performances of SVM model using linear kernel and ν -SVM type of different parameter ν for speed limit recognition with Zernike moments representation. The number of support vectors is increased by increasing the value of the parameter ν . However, since the number of margin error is increased also, the accuracy of training classification is decreased and on the whole the accuracy of test classification is decreased also.

support vectors

Support Vectors

160 140 120 100 80 0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Parameter nu

Figure 4.6 The Support Vectors of the SVM model using linear kernel and different parameter ν for speed limit recognition with Zernike moment representation training

test


110% 100% 90% 80% 70% 60% 0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Paramenter nu

Figure 4.7 The performances of the SVM model using linear kernel and different parameter ν for speed limit recognition with Zernike moment representation ______________________________________________________________________________________________________________________________________________________________________


- 68 -


Min Shi


July 2006

There is one parameter, γ , in RBF kernel; the parameter γ is normally a very small value. In our experiments, it was initialized as γ = 1 / n , where n is the number of attributes for an input vector. Figure 4.8 illustrates the performances of different values of γ . The training classification is improved by increasing the value of γ , however, the generalization of SVM model is decreased if the value of γ exceeds a certain value. With Zernike moments representation, the number of attributes for an input vector is 40, so the initialized value of γ = 0.025 . Obviously, for this experiment the best choice of γ is approximate to 0.1, therefore sometimes the default value γ = 1 / n is not a good choice.

training

c=1

test


120% 100% 80% 60% 40% 20% 0.0001

0.001

0.01

0.025

0.1

1

10

100

Parameter Gamma

Figure 4.8 The performances of the SVM model using RBF kernel and C-SVM of different parameter γ for speed limit recognition with Zernike moment representation Besides the parameter γ , sigmoid kernel has another parameter r . The parameter r ______________________________________________________________________________________________________________________________________________________________________


- 69 -


Min Shi


July 2006

is a coefficient presented in sigmoid kernel and polynomial kernel. It is not a very important parameter and normally is initialized as zero. Figure 4.9 shows the performances of different values of r . The accuracy of classification is decreased by increasing value of r . training

c=1 gamma=0.1 r=0

test


90% 80% 70% 60% 50% 0

0.0001 0.001

0.01

0.1

0.2

0.3

0.4

0.5

1

Parameter r

Figure 4.9 The performances of the SVM model using sigmoid kernel and C-SVM of different parameter r for speed limit recognition with Zernike moment representation There are three kernel parameters in polynomial kernel, γ , r and d . Figure 4.10 shows the performances of different values of d . Smaller values of d are better choices. On the whole, the generalization of the SVM model is decreased with the increase of d .

______________________________________________________________________________________________________________________________________________________________________


- 70 -


Min Shi


July 2006

training

c=1 gamma=1 r=0

test


110% 100% 90% 80% 70% 1

2

3

4

5

6

7

8

9

10

Parameter degree

Figure 4.10 The performances of the SVM model using polynomial kernel and C-SVM of different parameter d for speed limit recognition with Zernike moment representation In the above analysis, only one of these model parameters was changed to analyze the performances of the SVM model. Actually, the change of one parameter will affect the performances of other parameters if a SVM model has more than one model parameters. The more the number of model parameters the more complex the analysis of the SVM model. Grid search could be feasible to find optimal parameters of a SVM model when the number of model parameters is small, normally less than three. The time complexity of search is increased exponentially with the increase of parameters, as have been analyzed in section 2.4.4. Heuristic search is a better choice to find optimal parameters efficiently. Simulated Annealing search is implemented in this project. It works efficiently, especially when the number of model parameters is more than one.

Grid Search Grid search have been implemented to find optimal parameters for these models. Actually, the real search space is infinite, so it is impossible for us to search all the space. An upper bound, lower bound and search step for each parameter are defined, so that the search space was divided with geometric partition. Grid search works like exhaustive search, the size of search step is to control the complexity of search. Because the model must be evaluated at every grid region and the time complexity of search is increased exponentially with the increase of parameters, so only the grid search of RBF model was tested. The search regions of kernel parameters were ______________________________________________________________________________________________________________________________________________________________________


- 71 -


Min Shi


July 2006

defined as: C = {2 i | lower bound i = −5, upper bound i = 15, step i = 1} ,

γ = {2 j | lower bound j = −15, upper bound j = 3, step j = 1} , ν = {k | lower bound k = 0.1, upper bound k = 1, step k = 0.1} . The grid search of RBF model was tested using the same pairs of data set that were performed in table 4.21 and table 4.22. The results with and without grid search of RBF model are contrasted in table 4.23 and table 4.24. With grid search, the results of RBF model were improved remarkably, sometimes even better than linear model. However, these values of the parameters that were found maybe are not the best choices. Perhaps better parameters could be found after combining pattern search with grid search.

Table 4.23 The contrasts of with and without grid search for road sign shapes classification using RBF kernel with Zernike moments representation SVM Type

Correct Classification Rate Training Set (180) Test Set (120)

Kernel Parameters Default: C = 1 , γ = 1/ n

C-SVM

98.89% (178)

99.17% (119)

100%

99.17% (119)

98.33% (177)

99.17% (119)

100%

100%

Grid Search: C = 0.5 , γ = 0.125

Default:

ν = 0 .5 , γ = 1 / n ν -SVM

Grid Search:

ν = 0.1 , γ = 0.0019531

______________________________________________________________________________________________________________________________________________________________________


- 72 -


Min Shi


July 2006

Table 4.24 The contrasts of with and without grid search for speed limit signs classification using RBF kernel with Zernike moments representation SVM Type

Correct Classification Rate Training Set (150) Test Set (100)

Kernel Parameters Default: C = 1 , γ = 1/ n

C-SVM

89.33% (134)

72% (72)

100%

87% (87)

94% (141)

79% (79)

100%

85% (85)

Grid Search: C = 2 , γ = 0.125

Default:

ν = 0 .5 , γ = 1 / n ν -SVM

Grid Search:

ν = 0.1 , γ = 0.0078125

Simulated Annealing Search Simulated Annealing (SA) search also called as Monte Carlo annealing derived from the process of physical crystal formation. For forming a crystal, at the beginning the materials are heated with a very high temperature to a molten state. The temperature of the crystal is reduced slowly until the crystal structure is frozen in. The process of reducing temperature is very important; the structure of the crystal is bad if the cooling is done too quickly or too slowly. Normally, SA search works starting from a random point of the search space. One neighbor of this point is selected randomly and evaluated. This new point is always accepted if it is evaluated better than the current one, otherwise this new point is accepted with some probability p . This new neighbor could be worse than the current one. The probability of acceptance is defined as:

1

p= 1+ e

eval (Vc ) − eval (Vn ) T

,

where Vc is the current point, Vn is the new neighbor and T is an additional parameter looked as temperature. The search process is repeated until the stop ______________________________________________________________________________________________________________________________________________________________________


- 73 -


Min Shi


July 2006

criterion is satisfied. During the search process, the temperature is reduced step by step. The neighbor of the current point is defined as a random point in a certain range around the current point. The size of the range should be big enough so that the solution could escape local optimal. Of course, too big size increases the search space of the neighbors. Assume that T = 0.1 , the current point Vc is evaluated to 0.5, i.e., eval (Vc ) = 0.5 . Then the probability of acceptance depends on the evaluation of new point Vn , as shown in table 4.25.

Table 4.25 Probability of acceptance for T=0.1 and eval(Vc)=0.5 B

Eval(Vn) 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 B

B

Eval(Vn)- Eval(Vn) 0.4 0.3 0.2 0.1 0 -0.1 -0.2 -0.3 -0.4 -0.5 B

B

B

B

EXP(Eval(Vn)- Eval(Vn)) 54.59815003 20.08553692 7.389056099 2.718281828 1 0.367879441 0.135335283 0.049787068 0.018315639 0.006737947 B

B

B

B

B

p 0.02 0.05 0.12 0.27 0.5 0.73 0.88 0.95 0.98 0.99

The temperature is initialized with a very big value making the process similar to a random search at the beginning. The temperature is reduced gradually, so that the search process runs towards an ordinary hill-climber. The cooling ratio is another important definition of the SA search. As similar as the process of crystal formation the solution is not good if the cooling is done too quickly or too slowly. Figure 4.11 illustrates the performances of SA search with different cooling ratio.

______________________________________________________________________________________________________________________________________________________________________


- 74 -


Min Shi


July 2006


training

test

105% 100% 95% 90% 85% 80% 75% 0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Cooling Ratio

Figure 4.11 The performances of SA search with different cooling ratio In this project, the parameters of SA search were defined as: Initialized temperature: 100, Cooling ratio: 0.5, Iteration: 60. Table 4.26 and 4.27 show the contrasts of grid search and SA search. The SA search worked efficiently, it found optimal parameters after only searching 60 points. Furthermore, the results of SA search were sometimes better than grid search.

Table 4.26 The contrasts of grid search and SA search for road sign shapes classification using RBF kernel with Zernike moments representation Model C-SVM

ν -SVM

Search Ways Grid Search SA Search Grid Search SA Search

Search Points 399 60 399 60

Correct Classification Rate Training Set (180) Test Set (120) 100% 99.17% (119) 100% 100% 100% 100% 100% 100%

______________________________________________________________________________________________________________________________________________________________________


- 75 -


Min Shi


July 2006

Table 4.27 The contrasts of grid search and SA search for speed limit signs classification using RBF kernel with Zernike moments representation SVM Type C-SVM

ν -SVM

Search Ways Grid Search SA Search Grid Search SA Search

Search Points 399 60 399 60

Correct Classification Rate Training Set (150) Test Set (100) 100% 87% (87) 100% 87% (87) 100% 85% (85) 100% 86% (86)

4.4 Comparison of Different Recognition Models The research work of traffic and road sign recognition at Dalarna University began in 2005. A Fuzzy ARTMAP recognition model to classify traffic and road signs was developed in 2005. Later, Aenugula and Baddam [33, 35] presented the Zernike moments method to extract features from binary images of road sign and classified the features using Fuzzy ARTMAP recognition model. Moreover, features selected using PCA (principal component analysis) and LDA (linear discriminant analysis) methods were introduced and analyzed. This project has implemented the traffic and road signs classification using SVM recognition model. Two kinds of features of the road sign images, binary image and Zernike moments, are represented to the recognition model to be compared and analyzed. All the data sets that were trained and tested using the above recognition models were created from the same database of road sign images, a small images database that each category has 30 data samples. 25 training instances and 5 test instances were selected from each category to train and test the recognition models. Table 4.28 shows the best classification results performed by different recognition models for both road sign shapes and speed limit signs. As a new machine learning technology, the performance of SVM recognition model is indeed preferable to the ones of Fuzzy ARTMAP recognition model from whatever perspective.

______________________________________________________________________________________________________________________________________________________________________


- 76 -


Min Shi


July 2006

Table 4.28 Best Classification results from different recognition models Recognition Model Fuzzy ARTMAP & Zernike Moments Fuzzy ARTMAP & Zernike & PCA Fuzzy ARTMAP & Zernike & LDA SVM & Binary Images SVM & Zernike Moments

Best Classification of Road Sign Shapes Training Test

Best Classification of Speed Limit Signs Training Test

98%

96.67%

93.76%

76%

95%

90%

88.32%

48%

100%

100%

98.56%

96%

100%

100%

100%

100%

100%

100%

99.2%

100%

______________________________________________________________________________________________________________________________________________________________________


- 77 -


Min Shi


July 2006

Chapter Five Conclusion and Future Work

______________________________________________________________________________________________________________________________________________________________________


- 78 -


Min Shi


July 2006

SVM was introduced by Vapnik in1992, which has been quickly gained attention due to a great number of theoretical and computational merits. SVM roots in statistical learning theory and follows the principle of structural risk minimization to control the generalization ability of a learning machine. To solve a classification problem, SVM constructs a feature space by using a kernel function, and separates the data into categories in the feature space. This project has implemented a SVM recognition model for traffic and road signs classification. It focuses on recognizing seven categories of road sign shapes and five categories of speed limit signs. A database of road sign images for training and test the SVM recognition model has been provided. All pairs of training/test data set are selected randomly from the database. For each category, 30 instances are selected for training and 20 instances are selected for test. Two kinds of features, binary image and Zernike moments, are used to represent the data to the SVM for training and test. Binary image representation use 0 and 1 to respectively denote black and white pixels of a binary image, straightforwardly representing a binary road sign image to the SVM. While Zernike moments extracts features from a binary image, a set of Zermike moments of an image are represented to the SVM. In the experiments, the performances of SVM recognition model with different features, different kernels and different SVM types were compared and analyzed. Moreover, the best classification results from SVM recognition model and Fuzzy ARTMAP recognition model were compared. Classification using different features, binary image and Zernike moments. For both road sign shapes classification and speed limit signs classification, the SVM recognition model with binary representation had better performances than the one with Zernike moments, achieving 100% accuracy. Classification using different kernels and SVM types. The SVM model was trained and tested using the same pair of data set but different kernels. Lot of experiments showed that linear kernel performed better than others. The primary reason is that the accuracy of a SVM model depends on the selection of the model parameters. There are no parameter in linear kernel, but at least one parameter in others. Nevertheless, the process of searching the optimal parameters is time-consuming. With linear kernel, the performances of C-SVM surpass the performances of ν -SVM. Classification using different recognition models. The performance results of SVM recognition model were compared with the performance results of Fuzzy ARTMAP recognition model. The data sets that were classified by these models ______________________________________________________________________________________________________________________________________________________________________


- 79 -


Min Shi


July 2006

were created from the same database of road sign images. The SVM recognition model is indeed preferable to Fuzzy ARTMAP recognition model from whatever perspective. In general, the SVM recognition model using linear kernel and binary representation is the best combination for solving our problems. Despite that the SVM recognition model achieved very good results, it should be verified further by a bigger database of road signs. So it is necessary to increase the size of the training/test data sets in the further experiments. The ( p, q ) of Zernike moments were chosen from (5,1) to (12,12) in this project. However, the performance results of these Zernike moments were not very satisfied. Since higher order moments represent more detail features of an image, so a set of higher moments should be tested to improve the results. Of course, the consumed time for calculating the Zernike moments should be considered, so that the recognition can be achieved in real time. For polynomial, RBF, sigmoid models, the values of the kernel parameters decide the accuracy of the SVM recognition model. This project has implemented grid search and SA search to find optimal parameters for these models. After grid search, a set of suitable parameters was found and the performances of RBF model were improved remarkably. However, grid search works like exhaustive search, it is inefficient comparing with SA search. SA search found the similar optimal parameters after only searching 60 points.

______________________________________________________________________________________________________________________________________________________________________


- 80 -


Min Shi


July 2006

Appendix A: User Manual This is a user manual for Road Sign Recognition using SVM program, which focuses on introducing the use of this program. This program is run on window OS with GLUT. There are mainly nine menus in this program; the use of each part will be introduced briefly in the following subsections.

A.1 Convert Image Data to SVM Data This function is to convert binary image data into SVM data that will be used for training and test. Figure A.1 shows a use case of this function.

Figure A.1 A use case of converting image data to SVM data At first, all the binary image files that will be used for training and test should be described in an image list file. This file contains all the information about these binary images such as the name of the binary image, the number of the categories, which category a binary image belongs to and so on. The format of this image list file is described in the table A.1. The number of classes is the total number of the categories; ______________________________________________________________________________________________________________________________________________________________________


- 81 -


Min Shi


July 2006

the number of input units is the total number of pixels in one binary image. The of class is the target value of the following instances, which starts from 0; the is a name of this class, such as NOENTRY, STOP and so on; is the file name of a binary image. This image list file should be saved at the same directory with the binary images.

Table A.1 The format of image list file #number of classes #number of input units 1 #number of output units # class # name # number of class … # class # name # number of class … All the steps of using this function are: Input the directory of an image list file. In the above use case, the file name of the image list file is “list.txt”. Input a file name for saving the training data and a file name for saving the test data. For example “train.txt” is for saving the training data and “test.txt” is for saving the test data. If these files already exist, “y” is inputted to rewrite them; otherwise both file names for saving training and test data will be asked to input again. Input a number for training and a number for test instances on each category. This is to determine how many data will be selected for training and how many data ______________________________________________________________________________________________________________________________________________________________________


- 82 -


Min Shi


July 2006

will be selected for test from each category. After inputting all the terms, this function will display the binary images selected for test and the binary images selected for training. This selection process is performed randomly. The total number of training data and the total number of test data will be display in the end. Figure A.2 shows an example. The training data file and the test data are saved in the same directory with the program.

Figure A.2 Display the binary images selected for test and the binary images selected for training During the selecting process, the program creates the following files, where “*train*” represents the name of training file, for example “train.txt”, and “*test*” represents the name of test file, for example “test.txt”: 1) “*train*.list” and “*test*.list”. The format of these files is: is the target value of the data, while is the file name of binary image. These files are mainly used for comparing the original results and predicted results (see A.3). 2) “*train*.class”. The format of this file is: ______________________________________________________________________________________________________________________________________________________________________


- 83 -


Min Shi


July 2006

is the target value of a class, while is the name of the class. This file is mainly used for predicting an image file (see A.8).

A.2 Convert Zernike Data to SVM Data This function is to convert Zernike moments data into SVM data that will be used for training and test. Figure A.3 shows a use case of this function.

Figure A.3 A use case of converting Zernike moments data to SVM data All the Zernike moments for every binary image have been calculated and saved in a Zernike moments data file. Table A.2 describes the format of a Zernike moments data file. Each row of this file describes an instance. is the target value of the instance; are the Zernike moments values of the instance. Values are separated by blanks.

Table A.2 The format of Zernike moments data file ... … The operational steps of using this function are almost as same as the ones of using ______________________________________________________________________________________________________________________________________________________________________


- 84 -


Min Shi


July 2006

previous function: Input the directory of a Zernike moments data file. In the above use case, the data file is “speed50.txt”. Input a file name for saving the training data and a file name for saving the test data. Input a number for training and a number for test instances on each category. After inputting all the terms, this function will display the total number of training data and the total number of test data. This selection process is performed randomly. Figure A.4 shows an example. The training data file and the test data are saved in the same directory with the program. During the selecting process, the program creates “*train*.list” and “*test*.list” files that are mainly used for comparing the original results and predicted results.

Figure A.4 Display the total number of training data and the total number of test data for converting Zernike moments to SVM.

______________________________________________________________________________________________________________________________________________________________________


- 85 -


Min Shi


July 2006

A.3 Train and Test SVM with Default Values The pair of training/test data set that has been created in the part of data conversion alternatives will be trained and tested using SVM at the part of training and testing alternatives. Figure A.5 shows the steps of training and test the SVM model with default values.

Figure A.5 The steps of training and test the SVM model with default values Input training and test data file. The training and test file that have been created in the parts of data conversion alternatives should be inputted, for example “train.txt” and “test.txt”. These default values of SVM model are set in the constructor of svmTrain class. In the above case, the SVM type is C-SVM, RBF is the kernel type and so on. The result of training and test data using SVM model will be displayed, as shown in figure A.6. During this training and test process, the program creates the following files, where “*train*” represents the name of training file, for example “train.txt”, and “*test*” represents the name of test file, for example “test.txt”:

______________________________________________________________________________________________________________________________________________________________________


- 86 -


Min Shi


July 2006

1) “*train*.scale” and “*test*.scale”. They are scaling data file after scaling the SVM training and test data. 2) During scaling the SVM training data, “*train*.range” file is created. It is used for saving the range information for each attribute. The test data is scaled according to the “*train*.range” file. 3) “*train*.scale.model” is to save the model file after training the SVM model. SVM model predicts the test data according to the model file. 4) “*test*.scale.predict” is to save the predicted results.

Figure A.6 Display the training and test results. In the end, the program compares “*test*.scale.predict” file with “*test*.list” file, and then displays the wrong classification data. Figure A.7 illustrations the process and the created files at every step.

______________________________________________________________________________________________________________________________________________________________________


- 87 -


Min Shi


July 2006

Figure A.7 An illustrations the process and the created files at every step

A.4 Train and Test SVM with Manual The model parameters can be set with manual step by step. Figure A.8 illustrates some steps for setting the model parameters. Similarly, the training and test data files should be inputted. Choose the kernel function. Four basic kernels can be chosen; each one is listed after an integer number. To select the kernel, its corresponding number should be inputted. For example the input of “1” is to select linear kernel. Choose the SVM type. Five SVM types can be chosen, where C_SVC and NU_SVC are for classification and ONE_CLASS, EPSILON_SVR and NU_SVR are for regression. Each one has its corresponding number. For example the input of “1” is to select C_SVC. Different kernel function and SVM type have different model parameters. The default values of these parameters are shown in each step. The process of training and test is as same as the process of training and test SVM with default values. ______________________________________________________________________________________________________________________________________________________________________


- 88 -


Min Shi


July 2006

Figure A.8 An illustrations the process and the created files at every step

A.5 Train and Test SVM with from parameter file The model parameters can be set from a parameter file. Table A.3 shows the format of the parameter file. Each row of this file sets a value of its corresponding parameter. means the value of the parameter should be an integer value, while means the value of the parameter could be a float value.

Table A.3 The format of parameter file #svm_type 1:C-SVC; 2:nu-SVC; 3:one-class; 4:epslion-SVR; 5:nu-SVR #kernel_type 1:linear; 2:polynomial; 3:RBF; 4:sigmoid #degree #gamma #coef0 #00 #eps>0 #p>=0 #shrinking 1 or 0 #probability 1 or 0 ______________________________________________________________________________________________________________________________________________________________________


- 89 -


Min Shi


July 2006

Figure A.9 illustrates the steps for training and test SVM from a parameter file. A parameter file that contains the values of every parameter is inputted after inputting the training and test data files. In the below case the parameter file is “train.par”. The process of training and test is as same as the process of training and test SVM with default values.

A.9 A use case of training and test SVM from a parameter file

A.6 Train and Test SVM with Grid This function is to train and test SVM model using RBF kernel and two SVM type with grid search. Figure A.10 shows an example of the process.

______________________________________________________________________________________________________________________________________________________________________


- 90 -


Min Shi


July 2006

Figure A.10 A use case of train and test SVM model with grid search Of course, a pair of training/test data file should be inputted first. Two SVM types could be chosen with RBF kernel to train the training data file. The program will implement grid search to find the best parameters for the SVM model. The best parameters will be used for predicting the test data.

A.7 Train and Test SVM with SA This function is to train and test SVM model using RBF kernel and two SVM type with SA search. Figure A.11 shows an example of the process. Two SVM types could be chosen with RBF kernel to train the training data file. The program will implement SA search to find the best parameters for the SVM model. The best parameters will be used for predicting the test data.

______________________________________________________________________________________________________________________________________________________________________


- 91 -


Min Shi


July 2006

Figure A.11 A use case of train and test SVM model with SA search

A.8 Predict an Image file Before predicting an image file, a model file must have been existed. If the model file does not exist, the fourth to eighth step could be done to create a model file. Moreover a class file that contains target values and classes name also must have been existed, the first step could be done to create the class file. There are two files, a model file and a binary image file, should be inputted at this part. After that this image file will be converted to SVM data first, and then be predicted the target value according to the model file. For converting the image file to the SVM data, the “*train*.range” file must exists, which is used for scaling the test data. During the process, the program creates the following files, where “*image*” represents the name of an image file, for example “noentry_19.bmp”: 1) “*image*.txt” saves the data of converting the binary image to SVM data. 2) “*image*.txt.scale” saves the scaling data. ______________________________________________________________________________________________________________________________________________________________________


- 92 -


Min Shi


July 2006

3) “*image*.txt.scale.predict” saves the predicted result. The image file could be predicted with probability. Figure A.12 shows a use case of predicting an image file. If predicting with probability is chosen, a lower bound of probability must be inputted. If the highest predicted probability is not bigger than this lower bound, then this image is regarded as an unknown image, otherwise it is regarded as the class with the highest predicted probability. Figure A.13 illustrates the process of predicting with probability.

Figure A.12 A use case of predicting a image file

Figure A.13 The process of predicting with probability ______________________________________________________________________________________________________________________________________________________________________


- 93 -


Min Shi


July 2006

References

______________________________________________________________________________________________________________________________________________________________________


- 94 -


Min Shi

[1]


July 2006

"Intelligent Transportation Systems," Retrieved 09-07, 2006, from http://scout.wisc.edu/Reports/NSDL/MET/2003/met-030801-topicindepth.htm l. "Intelligent transportation system," Retrieved 09-07, 2006, from http://en.wikipedia.org/wiki/Intelligent_transportation_system. Arturo De La Escalera, Luis E. Moreno, Miguel Angel Salichs, and Jos´E Mar´Ia Armingol, "Road Traffic Sign Detection and Classification," 1997. "Warning sign," Retrieved 10-07, 2006, from http://en.wikipedia.org/wiki/Warning_signs. Hasan Fleyeh and Mark Dougherty, "ROAD AND TRAFFIC SIGN DETECTION AND RECOGNITION," in 10th EWGT Meeting and 16th Mini-EURO Conference. Poland, 2005. X. Gao, N. Shevtsova*, K. Hong, S. Batty, L. Podladchikova*, A. Golovan*, D. Shaposhnikov*, and V. Gusakova*, "Vision Models Based Identification of Traffic Signs," Retrieved 12-07, 2006, from http://nisms.krinc.ru/papers/CGIV02.pdf. Jun Miura, Tsuyoshi Kanda, and Yoshiaki Shirai, "An Active Vision System for Real-Time Traffic Sign Recognition," Retrieved 12-07, 2006, from http://www-cv.mech.eng.osaka-u.ac.jp/~jun/pdffiles/itsc2000.pdf. C. Y. Fang, C. S. Fuh, S. W. Chen, and P. S. Yen, "A Road Sign Recognition System Based on Dynamic Visual Model," Retrieved 12-07, 2006, from http://www.ice.ntnu.edu.tw/~violet/publicationlist/2003CVPR.pdf. A. De La Escalera, J.Mª Armingol, and M.A. Salichs, "TRAFFIC SIGN DETECTION FOR DRIVER SUPPORT SYSTEMS," Retrieved 12-07, 2006, from http://www.uc3m.es/uc3m/dpto/IN/dpin04/fsr01.pdf. Hasan Fleyeh, "Traffic Pictures," Retrieved 12-07, 2006, from http://users.du.se/~hfl/traffic. A. De La Escalera, J.Ma Armingol, and M. Mata, "Traffic sign recognition and analysis for intelligent vehicles," in Image and Vision Computing, 2003, pp. 21:247–258. Hasan Fleyeh, "Color detection and segmentation for road and traffic signs," in IEEE Conference on Cybernetics and Intelligent Systems. Singapore, 2004. Hasan Fleyeh, "A Novel Fuzzy Approach For Shape Determination of Traffic Signs," in Second Indian International Conference on Artificial Intelligence India, 2005. Hasan Fleyeh, "Road and Traffic Sign Color Detection and Segmentation - A Fuzzy Approach," in Machine Vision Applications. Japan: Tsukuba Science City, 2005. Hasan Fleyeh, "Shadow And Highlight Invariant Colour Segmentation Algorithm For Traffic Signs," in IEEE Conference on Cybernetics and HTU

UTH

[2]

HTU

[3] [4]

UTH

HTU

[5]

[6]

UTH

HTU

[7]

UTH

HTU

[8]

UTH

HTU

[9]

UTH

HTU

[10]

HTU

[11]

[12] [13]

[14]

[15]

UTH

UTH

______________________________________________________________________________________________________________________________________________________________________


- 95 -


Min Shi

[16] [17]

[18] [19]


Intelligent Systems. Thailand, 2006. Hasan Fleyeh, "Traffic Signs Color Detection and Segmentation in Poor Light Conditions," Tsukuba Science City, 2005. Nello Cristianini and John Shawe-Taylor, AN INTRODUCTION TO SUPPORT VECTOR MACHINES, vol. Preface: Cambridge University Press, 2000. Tom Mitchell, Machine Learning: McGraw-Hill Science/Engineering/Math, 1997. "Machine learning," Retrieved 01-06, 2006, from http://en.wikipedia.org/wiki/Machine_learning. "Inductive vs. Deductive Learning," Retrieved 01-06, 2006, from http://www.cs.utah.edu/classes/cs5350/slides/GeneralConcepts_4.pdf. Nils J. Nilsson, "INTRODUCTION TO MACHINE LEARNING," vol. CHAPTER 1. Stanford, 1996, pp. 2. "Supervised learning," Retrieved 01-06, 2006, from http://en.wikipedia.org/wiki/Supervised_learning. Nello Cristianini and John Shawe-Taylor, AN INTRODUCTION TO SUPPORT VECTOR MACHINES, vol. CHAPTER 1.2: Cambridge University Press, 2000. Nello Cristianini and John Shawe-Taylor, AN INTRODUCTION TO SUPPORT VECTOR MACHINES, vol. Chapter 2.1: Cambridge University Press, 2000. Nello Cristianini and John Shawe-Taylor, AN INTRODUCTION TO SUPPORT VECTOR MACHINES, vol. CHAPTER 3: Cambridge University Press, 2000. Steve R. Gunn, "Support Vector Machines for Classification and Regression," UNIVERSITY OF SOUTHAMPTON 1998. Ming-Wei Chang and Chih-Jen Lin, "Leave-one-out Bounds for Support Vector Regression Model Selection," Retrieved 05-07, 2006, from http://www.csie.ntu.edu.tw/~cjlin/papers/svrbound.pdf. Nello Cristianini and John Shawe-Taylor, AN INTRODUCTION TO SUPPORT VECTOR MACHINES, vol. Chapter 4: Cambridge University Press, 2000. Joel Ratsaby, "A Stochastic Gradient Descent Algorithm for Structural Risk Minimisation," Retrieved 07-07, 2006, from http://www.bgu.ac.il/~ratsaby/Publications/PDF/alt2003.pdf. Chih-Chung Chang and Chih-Jen Lin, "LIBSVM: a Library for Support Vector Machines," Retrieved 15-07, 2006. M. Teague, "Image analysis via the general theory of moments," J. Opt Soc. Am., vol. 70 (8), pp. 920–930, 1980. Chee-Way Chong, P. Raveendran, and R. Mukundan, "Translation invariants HTU

[20]

July 2006

UTH

HTU

[21] [22]

UTH

HTU

[23]

[24]

[25]

[26] [27]

UTH

HTU

[28]

[29]

UTH

HTU

[30] [31] [32]

UTH

______________________________________________________________________________________________________________________________________________________________________


- 96 -


Min Shi

[33]

[34] [35]


July 2006

of Zernike moments," Pattern Recognition, vol. 36, pp. 1765-1773, 2003. Sruthi Baddam, "Recognition of Speed Limit Road Signs using Zernike Moments and Fuzzy ARTMAP," in Computer Engineering, vol. Master's. Borlange: Dalarna University, 2006. A. Padilla-Vivanco, A. Martínez-Ramírez, and F. Granados-Agustín, "Digital image reconstruction by using Zernike moments," Retrieved 10-07, 2006. Dinesh Reddy Aenugula, "Road and Traffic Sign Shapes Recognition using Zernike Moments and Fuzzy ARTMAP," in Computer Engineering, vol. Master. Borlange, Sweden: Dalarna University, 2006.

______________________________________________________________________________________________________________________________________________________________________


- 97 -