A Fast Algorithm for License Plate Detection (LPD)

2 downloads 0 Views 1MB Size Report
Abstract-Automatic license plate recognition (ALPR) is one of the most important ... Prior to the character recognition, the license plates must be located from the ...
1

DCT vs. DWT Based License Plate Detection Amr E. Rashed Information Technology Deanship ,Taif University, KSA [email protected] Abstract-Automatic license plate recognition (ALPR) is one of the most important aspects of applying computer vision techniques and image processing algorithms toward intelligent transportation systems.in order to recognize a license plate efficiently; first license plate must be detected or located from a vehicle image. This step is considered to be the most crucial step of an ALPR system, which affect the recognition rate and speed of the system also. In this paper, two techniques are proposed one based on DCT features and the other is based on DWT features with some different neural network classifiers such as multilayer perceptron (MLP), modular neural network, and generalized feed forward neural network .in this method, firstly histogram equalization technique is used to increase contrast secondly vertical and horizontal edge detection are applied to help in removing border and background then image segmentation and ANN classifiers takes place .proposed techniques are explained and designed using Matlab. We achieved about 99.8% detection rate for small dataset with mean square error about 0.0063. Keywords- license plate detection (LPD), license plate segmentation, discrete cosines transform (DCT), Support vector machine (SVM), discrete wavelet transforms (DWT), principal component analysis (PCA) 1. INTRODUCTION License plate detection has been adopted widely into numerous applications such as unattended parking, security control of restricted areas and stolen vehicle verification. Because of ambient lighting conditions, image perspective distortion, interference characters, etc., it is difficult to efficiently detect license plates in complex conditions [1, 2]. In the approach, a camera captures the vehicle images and a computer processes the captured images and recognizes the information on the license plate by applying various image processing and optical pattern recognition techniques. Prior to the character recognition, the license plates must be located from the background vehicle images. This task is considered as the most crucial step in the ALPR system, which influences the overall accuracy and processing speed of the whole system significantly. Since there are problems such as poor image quality, image perspective distortion, other disturbance characters or reflection on vehicle surface, and the color similarity between the license plate and background vehicle body, the license plate is often difficult to be located accurately and efficiently.[3,4] As a solution, we have implemented a system that can extract the license plate of a vehicle from an image given a set of constraints. In any object recognition system, there are two major problems that need to be solved (i) detecting an object in a scene and (ii) recognizing it; detection being an important requisite. In our system, the quality of the license plate detector is doubly important since the make and model Recognition subsystem uses the location of the license plate as a reference point when querying the car database. This paper is organized as follows: section-II presents general constraints and data collections; section-III gives a brief discussion of previous work, section-IV discusses the proposed technique and comparative study results for different ANN topology with DCT and DWT features, while, section-V introduces the results and conclusion.

2

2. CONSTRAINTS AND DATA COLLECTION 2.1. Constraints Due to the limited time interval; a set of constraints have been placed on the system to make the algorithm more manageable. These constraints are: (i) use a digital camera; (ii) image of the vehicle taken with variable angles; (iii) image of the vehicle taken from fixed distance (about 1-2 m); (iv) vehicle is stationary when the image was taken, (v) only Egyptian license plates will be processed; Fig.1.

Fig.1 An example of Egyptian license plate of dimensions is 40×16 cm

2.2. Data Collection The images of vehicles were taken with a Benq digital camera, with three different resolutions. On average, the images were taken (1-2m) away from the vehicle. They were stored in color JPEG format on the camera. We use Matlab to convert the color JPEG images into gray scale raw format on the PC; see Fig.2 and 3.

Fig.2 An example of the acquired data (color image)

Fig.3 an Example of the Acquired Data (grayscale image)

3. Previous Work Most LPR systems employ detection methods such as corner template matching [5] and Hough transforms [6, 7] combined with various histograms based methods. Kim et al. [8] take advantage of the color and texture of Korean license plates (white characters on green background, for instance) and train a Support Vector Machine (SVM) to perform detection. Their license plate images range in size from 79 × 38 to 390 × 185 pixels, and they report processing low-resolution input images (320 × 240) in over 12 seconds on a Pentium3 800MHz, with a 97.4% detection rate and a 9.4% false positive rate. Simpler methods, such as adaptive binarization of an entire input image followed by character localization, also appear to work as shown by Naito et al. [9] and [10], but are used in settings with little background clutter and are most likely not very robust. Since license plates contain a form of text, we decided to face the detection task as a text extraction problem. Of particular interest to us was the work done by Chen and Yuille on extracting text from street scenes for reading for the blind [11]. Their work, based on the efficient object detection work by Viola and Jones [12], uses a strong classifier with a good detection rate and a very low false positive rate. We found that this text detection framework also works well for license plate detection. When we use color based filter there are some problems license plate comes in different color according d to the type of driving license. The Saudi Arabian License plate recognition [13] used a recognition algorithm based on width to height ratio but there are many problems in this method: (i) width to height ratio is differs from a car to another depending on the distance between the camera and the car; (ii) small vertical edges will difficult the recognition problem because it change the width between edges, and (iii) when we use different view this will remove desired vertical edges,(v) There are many objects in the image achieves equal width to height ratio. 4. The Proposed Technique In this section the proposed algorithm will be explained then applied to the Egyptian license plate, the algorithm has four steps: (A) Histogram Equalization (B) Removal of Border and Background (C) Image Segmentation (D) License Plate Detection. 4.1. Histogram Equalization

3

Is an image transformation that computes a histogram of every intensity level in a given image and stretches it to obtain a more sparse range of intensities .This manipulation yields an image with higher Contrast than the original? The process is based on the creation of a transfer function that maps the old intensity values to new intensity values. To increase the contrast of the gray scale image from the PC, histogram equalization is used; we will use this step for low contrast images only, see Fig.4 and 5.

4

x 10 4

x 10

9

6

8 5

7 6

4

5 3

4 3

2

2 1

1 0

0

0 0

50

100

150

200

Fig.4 Car before getting histogram equalization

50

100

150

200

250

250

Fig.5 Result of histogram equalization

4.2. Removal of Border and Background As with the license plate recognition problem, detecting the car is the first step to performing make and model recognition (MMR). To this end, one can apply a motion segmentation method to estimate a region of interest (ROI) containing the car. This method would also be useful for make and model recognition in static images. We first convert image to black and white then use Matlab to remove border of the image. The proposed system uses the Sobel edge detector because it shows better results. The threshold used by the edge detector is dynamic because the system takes an automatic value from the algorithm. The Sobel edge detector uses a 3×3 mask, which is applied on the input image to give the resultant edged image. The edge detection algorithm is not time consuming. First of all, get the vertical edges using Sobel edge detection function for the both sides of the image and then removes the area outside tall vertical edges; see Figs. 6, 7 and 8. Hence, get the horizontal edges using Sobel edge detection function for the upper and the bottom and the removes the area outside wide horizontal edges; Fig.9. On the other hand, Fig.10 shows the reduced form of horizontal edges.

Fig.6 Sobel vertical edges

Fig.7 Vertical edge regions

Fig.8 Image after removing small elements

Fig.9 Horizontal edges after removing small elements

4

Fig.10 Car after removing border and background

4.3. Image Segmentation Often the license plate will be in the lower half of the image so we will remove upper half of the image (we can use the lower 1/3 of image, for safety we will use lower half). After these steps new image will be about 1/3 original image so the recognition algorithm will be very fast; Fig.11.

Fig.11 Image after segmentation

4.4. License Plate Detection 4.4.1 Features Extraction Feature extraction is the transformation of the original data (using all variables) to a data Set with a reduced number of variables. In the problem of feature selection, the aim is to select those variables that contain the most discriminatory information. Alternatively, we may wish to limit the number of measurements we make, perhaps on grounds of cost, or we may want to remove redundant or irrelevant information to obtain a less complex classifier. In feature extraction, all variables are used and the data are transformed (using a linear or nonlinear transformation) to a reduced dimension space. Thus, the aim is to replace the original variables by a smaller set of underlying variables. There are several reasons for performing feature extraction: (i) to reduce the bandwidth of the input data (with the resulting improvements in speed and reductions in data requirements) ;(ii) to provide a relevant set of features for a classifier, resulting in improved performance, particularly from simple classifiers; (iii) to reduce redundancy;(v) to recover new meaningful underlying variables or features that the data may easily be viewed and relationships and structure in the data identified [14]. 4.4.2 DCT coefficients The DCT is a loss-less and reversible mathematical transformation that converts a spatial amplitude representation of data into a spatial frequency representation. One of the advantages of the DCT is its energy compaction property, that is, the signal energy is concentrated on a few components while most other components are zero or are negligibly small. The DCT was first introduced in 1974 and since then it has been used in many applications such as filtering, trans- multiplexers, speech coding, image coding (still frame, video and image storage), pattern recognition, image enhancement, and SAR/IR image coding. The DCT is widely used in image compression applications, especially in lossy image compression. For example, the 2D DCT is used for JPEG still image compression, MPEG moving image compression, and the H.261 and H.263 video-telephony coding schemes. The energy compaction property of the DCT is well suited for image compression since, as in most images, the energy is concentrated in the low to middle frequencies, and the human eye is more sensitive to the middle frequencies. The DCT is not easy to implement because it is data dependent but it provides a very good compression ratio [15,16]. 4.4.3 DWT Coefficients Wavelets have been demonstrated to give quality representations of images. The discrete cosine features gives excellent classification with face and iris recognition, so we would compare between DWT features and DCT features using nine classifiers then compute mean square error, true positive and true negative.

5

Recently, Haar-like features were widely used for object detection. A set of Haar-like features can represent the interior structure of an object invariant to some transformations. The classifiers based on Haar-like features can detect objects from complex background independent of the variance of the color, the illumination, the position and the size of the objects. However, one of the problems of these algorithms is that a large account of features was included in the classifier, which makes the system very complex and unstable. Chen and Yuille constructed a simple cascade classifier for text detection using statistical features. Their algorithm could detect text regions in various natural scenes. However, only statistical features were used in their algorithm, which always results in high false positive rate in practice [2]. 5. Artificial Neural Networks (ANNs) Neural networks have been shown to be excellent classification devices because of their inherent learning ability [l7, 18]. Multilayer topologies are capable of learning non-linear decision boundaries, a fact which increases the versatility of neural networks to solve real-world. One type of “reduced-weight” architecture is called the locally-connected, weight-sharing network. Notice that the hidden layer nodes only accept “local” information from the inputs. This type of architecture allows us to use the neural network as a local feature extractor, which in turn, also reduces the set of weights in the network. The weight-sharing characteristic involves grouping hidden layer nodes into “feature maps”, where each node within the map accepts inputs from different sections of the input layer. These nodes can then share the same synaptic weights, and therefore, look for the same “features” in different parts of the input matrix. This allows for even further reduction in network parameters. The locally-connected, weight-sharing network architecture is of great benefit when applied to texture classification. We approached the license plate detection problem as a text extraction problem [11]. The method we employ for detecting license plates can be described as follows. A window of interest, of roughly the dimensions of a license plate image, and the image contents are passed as input after getting DCT features to a single layer neural network (weak classifier) whose output is 1 if the window appears to contain a license plate and 0 otherwise. The window is then placed over all possible locations in the frame and candidate license plate locations are recorded for which the classifier outputs a 1 then use it as an input for multilayer neural network (strong classifier) Fig.12 Since the size of a license plate image can vary significantly with the distance from the car to the camera, using a fixed-size window of interest is impractical. Window-based detection mechanisms often scan a fixed-size window over a pyramid of image scales. Instead, we used three different sizes of windows, each having a custom-trained strong classifier for that scale. Scanning every possible location of every frame would be very slow were it not for cascaded classifiers, the cascaded classifiers greatly speed up the detection process, as not strong classifier need be evaluated to rule out most non-license plate sub-regions, see also Fig.12.

Scanning window

Weak classifier

I/p image

Strong classifier Multilayer neural network

Desired o/p

Reject window

Fig.12 A cascaded classifier. The early stage is very efficient and good at rejecting the majority of false windows.

5.1 Multilayer Perceptron (MLP) A multilayer perceptron (MLP) is a feed forward artificial neural network model that maps sets of input data onto a set of appropriate outputs. An MLP consists of multiple layers of nodes in a directed graph, with each layer fully connected to the next one. Except for the input nodes, each node is a neuron (or processing element) with a nonlinear activation function. MLP utilizes a learning technique called back propagation for training the network. MLP is a modification of the standard linear perceptron and can distinguish data that are not linearly separable [19]. The confusion matrix tallies the results of all exemplars of the last epoch and computes the classification percentages for every output vs. desired combination. For example, in the figure above, 100% of the license

6

exemplars were correctly classified while 0% of the license exemplars were classified incorrectly as not-license. Similarly, 33.3% of the not-license exemplars were correctly classified while 66.6% of the not-license exemplars were classified as license. Figure 13 shows the active cost of the MLP when using DWT features; table 1 shows the performance of neural network and table 3 shows the active confusion matrix of the network. Figure 14 shows the active cost of the MLP when using DCT features. One can see that the cost curve of the MLP approaches to zero, which means that MLP has understood the problem; table 2 shows the performance of neural network and table 4 shows the active confusion matrix of the network.

Fig.13 active cost curve for MLP (DWT features)

Fig.14 active cost curve for MLP (DCT features)

Table 1 the performance, showing the mean squared error (MSE), the normalized mean squared error (NMSE), percent error (% error), Akaike's information criterion (AIC), and Rissanen's minimum description length (MDL) criterion.

Table 2 the performance, showing the mean squared error

MSE

0.388823738625

MSE

0.000636164466

NMSE

0.712770789099

NMSE

0.001227169109

R %error

0.535941020032 12.015745104870

R

0.999397493161

%error

0.568588640613

AIC

393.550383740475

AIC

272.798931323

MDL

323.431857308871

MDL

167.350820047167

Table 3. Active Confusion Matrix(Percentage)

License

Not-license

License

100%

0%

Not-license

66.6666

33.3333

(MSE), the normalized mean squared error (NMSE), percent error (% error), Akaike's information criterion (AIC), and Rissanen's minimum description length (MDL) criterion .

Table 4. Active Confusion Matrix(Percentage)

License

Not-license

License

100%

0

Not-license

0

100%

5.2 Generalized Feed Forward Networks are a generalization of the MLP such that connections can jump over one or more layers. In theory, a MLP can solve any problem that a generalized feed forward network can solve. In practice, however, generalized feed forward networks often solve the problem much more efficiently. A classic example of this is the two spiral problem. Without describing the problem, it suffices to say that a standard MLP requires hundreds of times more training epochs than the generalized feed forward network containing the same number of processing elements [20]. Figure 15 shows the active cost of the generalized feed forward network when using DWT features; table 5 shows the performance of neural network and table 7 shows the active confusion matrix of the network. Figure 16 shows the active cost of the generalized feed forward network when using DCT features. table 6 shows the performance of neural network and table 8 shows the active confusion matrix of the network.

7

Fig.15 active cost curve for Generalized Feed Forward network (DWT features).

Fig.16 active cost curve for Generalized Feed Forward network (DCT features).

Table 5 the performance, showing the mean squared error (MSE), the normalized mean squared error (NMSE), percent error (% error), Akaike's information criterion (AIC), and Rissanen's minimum description length (MDL) criterion.

Table 6 the performance, showing the mean squared error (MSE), the normalized mean squared error (NMSE), percent error (% error), Akaike's information criterion (AIC), and Rissanen's minimum description length (MDL) criterion .

MSE

0.019819884703

MSE

0.004454328308

NMSE

0.036332747866

NMSE

156722744584458476.012000

R

0.982424605368

R

0.00000

%error

2.775455378931

%error

3.283177736229

AIC

506.210052115753

AIC

507.722419957924

MDL

403.369546682735

MDL

353.065190085238

Table 7. Active Confusion Matrix(Percentage)

Table 8. Active Confusion Matrix(Percentage)

License

Not-license

License

Not-license

License

100.00

0.0000

License

80%

20%

Not-license

0.000

100.00

Not-license

0

0

5.3 Modular Neural Network Modular feed forward networks are a special class of MLP. These networks process their input using several parallel MLPs, and then recombine the results. This tends to create some structure within the topology, which will foster specialization of function in each sub-module. In contrast to the MLP, modular networks do not have full interconnectivity between their layers. Therefore, a smaller number of weights are required for the same size network (i.e. the same number of PEs). This tends to speed up training times and reduce the number of required training exemplars. There are many ways to segment a MLP into modules. It is unclear how to best design the modular topology based on the data. There are no guarantees that each module is specializing its training on a unique portion of the data [20]. Figure 16 shows the active cost of the modular neural network when using DWT features; table 9 shows the performance of neural network and table 11 shows the active confusion matrix of the network. Figure 17 shows the active cost of the modular neural network when using DCT features; table 10 shows the performance of neural network and table 12 shows the active confusion matrix of the network.

8

Fig.16 active cost curve for Modular Neural network (DWT features).

Fig.17 active cost curve for Modular Neural network (DCT features).

Table 9 the performance, showing the mean squared error (MSE), the normalized mean squared error (NMSE), percent error (% error), Akaike's information criterion (AIC), and Rissanen's minimum description length (MDL) criterion.

Table 10 the performance, showing the mean squared error (MSE), the normalized mean squared error (NMSE), percent error (% error), Akaike's information criterion (AIC), and Rissanen's minimum description length (MDL) criterion .

MSE

0.019749551989

MSE

0.001599693014

NMSE

0.036203817712

NMSE

1.#inf00000000000

R

0.983158872414

R

-1.#inf00000

%error

2.505886232883

%error

1.386435760755

AIC

806.110514783641

AIC

825.868790516245

653.185347613763

MDL

514.212918998140

MDL

Table 11. Active Confusion Matrix(Percentage)

License

Not-license

License

100%

0%

Not-license

0

100%

Table 12. Active Confusion Matrix(Percentage)

License

Not-license

License

71.4285%

28.5714%

Not-license

0

0

5.4 Jordan/Eleman Network Jordan and Elman networks extend the multilayer perceptron with context units, which are processing elements (PEs) that remember past activity. Context units provide the network with the ability to extract temporal information from the data. In the Elman network, the activity of the first hidden PEs is copied to the context units, while the Jordan network copies the output of the network. [20]. Figure 18 shows the active cost of the Jordan/Eleman network when using DWT features; table 13 shows the performance of neural network and table 15 shows the active confusion matrix of the network. Figure 19 shows the active cost of the Jordan/Eleman network when using DCT features; table 14 shows the performance of neural network and table 16 shows the active confusion matrix of the network.

Fig.18 active cost curve for Jordan/Eleman network (DWT features).

Fig.19 active cost curve for Jordan/Eleman network (DCT features).

9 Table 13 the performance, showing the mean squared error (MSE), the normalized mean squared error (NMSE), percent error (% error), Akaike's information criterion (AIC), and Rissanen's minimum description length (MDL) criterion .

Table 14 the performance, showing the mean squared error (MSE), the normalized mean squared error (NMSE), percent error (% error), Akaike's information criterion (AIC), and Rissanen's minimum description length (MDL) criterion.

MSE

0.019190821857

NMSE

0.035179583651

R

0.985552861837

%error AIC

2.957548561269 799.306952008975

MDL

647.383478073834

MSE NMSE R %error AIC MDL

Table 15. Confusion Matrix(Percentage)

0.000248029876 1.#inf00000000000 -1.#IND00000 0.571839122043 923.886270553116 419.607212146196

Table 16. Confusion Matrix(Percentage)

License

Not-license

License

100%

0%

Not-license

0

100%

License

Not-license

License

10%

90%

Not-license

0

0

5.5 PCA (Principal Component Analysis)

Principal component analysis networks (PCAs) combine unsupervised and supervised learning in the same topology. Principal component analysis is an unsupervised linear procedure that finds a set of uncorrelated features, principal components, from the input. A MLP is supervised to perform the nonlinear classification from these components [14]. Figure 20 shows the active cost of the PCA network when using DWT features; table 17 shows the performance of neural network and table 19 shows the active confusion matrix of the network. Figure 21 shows the active cost of the PCA network when using DCT features; table 18 shows the performance of neural network and table 20 shows the active confusion matrix of the network.

Fig.20 active cost curve for PCA network (DWT features).

Fig.21 active cost curve for PCA network (DCT features).

Table 17 the performance, showing the mean squared error (MSE), the normalized mean squared error (NMSE), percent error (% error), Akaike's information criterion (AIC), and Rissanen's minimum description length (MDL) criterion.

Table 18 the performance, showing the mean squared error (MSE), the normalized mean squared error (NMSE), percent error (% error), Akaike's information criterion (AIC), and Rissanen's minimum description length (MDL) criterion .

MSE

0.313512334759

MSE

0.001987056387

NMSE

0.574713969443

NMSE

1.#inf00000000000

R

0.661186724179

R

-1.#IND00000

%error

11.557519252331

%error

0.737620609327

AIC

949.522335917970

AIC

847.788990623073

MDL

785.578543165983

MDL

461.627099279219

11

Table 19. Active Confusion Matrix(Percentage)

License

Not-license

License

100%

0

Not-license

66.6666

33.3333

Table 20. Active Confusion Matrix(Percentage)

License

Not-license

License

57.1428%

42.8571%

Not-license

0

0

5.6 RBF/GRNN/PNN Network

Radial basis function (RBF) networks are nonlinear hybrid networks typically containing a single hidden layer of processing elements (PEs). This layer uses gaussian transfer functions, rather than the standard sigmoidal functions employed by MLPs. The centers and widths of the gaussians are set by unsupervised learning rules, and supervised learning is applied to the output layer. These networks tend to learn much faster than MLPs. If a generalized regression (GRNN) / probabilistic (PNN) net is chosen, all the weights of the network can be calculated analytically. In this case, the number of cluster centers is by definition equal to the number of exemplars, and they are all set to the same variance. Use this type of RBF only when the number of exemplars is so small (

Suggest Documents