Application-Oriented License Plate Recognition - IEEE Xplore

0 downloads 0 Views 1MB Size Report
Feb 12, 2013 - Abstract—We split the applications of vehicle license plate recognition ... Index Terms—Character segmentation, plate detection, vehicle license plate ...... RP subset for manual annotation of the features for training, and the ...
552

IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 62, NO. 2, FEBRUARY 2013

Application-Oriented License Plate Recognition Gee-Sern Hsu, Member, IEEE, Jiun-Chang Chen, and Yu-Zu Chung

Abstract—We split the applications of vehicle license plate recognition (LPR) into three major categories and propose a solution with parameter settings that are adjustable for different applications. The three categories are access control (AC), law enforcement (LE), and road patrol (RP). Each application is characterized by variables of different variation scopes and thus requires different settings on the solution with which to deal. The proposed solution consists of three modules for plate detection, character segmentation, and recognition. Edge clustering is formulated for solving plate detection for the first time. It is also a novel application of the maximally stable extreme region (MSER) detector to character segmentation. A bilayer classifier, which is improved with an additional null class, is experimentally proven to be better than previous methods for character recognition. To assess the performance of the proposed solution, the applicationoriented license plate (AOLP) database is composed and made available to the research community. Experiments show that the proposed solution outperforms many previous solutions, and LPR can be better solved by solutions with settings oriented for different applications. Index Terms—Character segmentation, plate detection, vehicle license plate recognition (LPR).

I. I NTRODUCTION

M

ANY RECENT works on vehicle license plate recognition (LPR) have described the variables considered, and illumination, camera viewpoint, and the distance from the camera to the plate are among the most common variables [1]–[12]. Although these variables and others are mentioned, a few, if any, discuss the variation scope of each variable in different applications, not to mention the impact made by the variables with different variation scopes on the LPR solution. In the case where a vehicle passes a passage monitored by a surveillance camera with a fixed focal length, the orientation and size of the plates captured can only have marginal difference from one another. However, in case that the camera is installed on a patrolling vehicle and it takes the image of a vehicle with an arbitrary viewpoint, the aforementioned variables must each reveal a large variation scope. Depending on different applications, the variables and, particularly, the variation scope in each variable would be different.

Manuscript received February 12, 2012; revised May 28, 2012, July 30, 2012, and September 28, 2012; accepted October 10, 2012. Date of publication October 24, 2012; date of current version February 12, 2013. The review of this paper was coordinated by Dr. M. S. Ahmed. G.-S. Hsu and Y.-Z. Chung are with the Artificial Vision Laboratory, Department of Mechanical Engineering, National Taiwan University of Science and Technology, Taipei 106, Taiwan (e-mail: [email protected]). J.-C. Chen is with the Senao International Co. Ltd., Taoyuan 333, Taiwan. Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TVT.2012.2226218

This study splits general LPR applications into three categories: access control (AC), traffic law enforcement (LE), and road patrol (RP). Each category is characterized by the variables, such as camera viewpoint, plate size in the image, and illumination condition, with application-oriented scopes of variation. Applications with variables of larger variation scopes require more sophisticated processing and higher computational cost than those with less variation scopes. It is inefficient to apply the methods developed for RP scenarios to handle AC. On the contrary, a method developed for AC cannot solve most cases in RP. This paper redefines the LPR problem with the variables and their variation scopes for the aforementioned three major applications. Although LPR has been an active research topic for more than a decade, the lack of benchmark databases is often acknowledged [6], [13]. Most works are reported with performance evaluated on proprietary data sets, which are hardly available for a performance comparison. Many reviews can only cite the performance reported without being able to actually validate it [6], [8], [12]. Among those available to the public, a few are only with a limited number of samples or a limited scope of variables [10]. An appropriate one is given by Anagnostopoulos et al. [6], which offers 741 images with variables of different variation scopes. It is, however, not good enough for the performance study in line with the aforementioned three major applications. This paper introduces the first version of the application-oriented license plate (AOLP) benchmark database, which has 2049 images categorized into three subsets, and each subset offers a good scope of samples to represent one of the three major applications. All samples were collected in Taiwan, from various locations, time, traffic, and weather conditions. The database can be accessed from the website http://140.118.199.117/LPR/AOLP. Fig. 1 shows a couple samples from the three subsets of the database. To validate the proposition that LPR is better handled in an application-oriented way, a solution with settings and parameters that are adjustable for different applications is proposed. Similar to most LPR approaches, the proposed solution consists of three modules for plate detection, character segmentation, and recognition. Novelty exists in each module. For example, this is the first time that clustering is applied to plate detection with the number of clusters determined by an approach proposed in this paper. This is also the fist time that the maximally stable extreme region (MSER) is proven to be an effective feature for character segmentation, although it was originally proposed for finding stereo correspondences [14]. Although the features and the classifier in the character recognition module have been used in previous research, the localized features in the enhancement level and the addition of a null class make the

0018-9545/$31.00 © 2012 IEEE

HSU et al.: APPLICATION-ORIENTED LICENSE PLATE RECOGNITION

Fig. 1.

553

Samples from the AOLP database.

proposed module a promising one. The performance of each module is compared with a couple of competitive methods to justify its role as a reference solution for future benchmark test. A thorough comparison of LPR methods is beyond the scope of this paper, it instead aims at formulating LPR as an application-oriented problem with a publicly accessible database and proposing a solution with settings that are adjustable for different applications. Advancing from our previous work [15] where only preliminary results were presented, this paper matures all solution modules and reports an extensive experimental study based on a more rigorous evaluation criterion. The contributions of this paper can be summarized as follows: • the introduction of the AOLP database that has three subsets, where each subset is composed of samples covering variables with sufficient variation scopes; • an LPR solution with settings that are adjustable for different applications and is experimentally proven to be competitive to existing approaches. The rest of this paper is organized as follows. Section II specifies the variables and their variation scopes for the three applications, and introduces the AOLP database. Section III presents the proposed solution with its three modules. An extensive experimental study is reported in Section IV that reveals the advantages of tackling LPR in the application-oriented way, and compares the performance of the proposed solution to other competitive approaches. A conclusion to this paper is given in Section V. II. M AJOR A PPLICATIONS AND THE A PPLICATION -O RIENTED L ICENSE P LATE DATABASE The major applications of LPR can be split into three categories: AC, traffic LE, and RP. AC refers to the cases that a vehicle passes a fixed passage at a reduced speed or with a full stop, such as at a toll station or the entrance/exit of a region. Considering the variables shown in Fig. 2, in AC scenarios, the camera is often placed less than 5 m from the plate, within ±30◦ in pan and 0◦ –60◦ in tilt (as 0◦ tilt is parallel to the ground). In the image, the width of a plate is between 0.2 and 0.25 of the width of the image (shown as width ratio in Table I), and its orientation is less than 10◦ . (Note that both are measured in the image by the plate projected onto the image plane.) The given parameters are generalized from the 681 images collected at various AC scenes. The illumination covers indoor, outdoor, daytime, nighttime, and different weather conditions. If measured by the average intensity over a plate, it varies from 60 to 130 in an 8-bit grayscale. The given parameters for AC are summarized in Table I, along with those for LE and RP. LE refers to the cases that a vehicle violates traffic laws and is captured by a roadside camera. For this application

Fig. 2. Distance determines the plate size in the image, pan and tilt determine the out-of-plane rotation of the plate projected onto the image, and orientation refers to the in-plane rotation of the view. TABLE I VARIABLES AND VARIATION S COPES FOR T HREE M AJOR A PPLICATIONS , G ENERALIZED F ROM THE AOLP DATABASE

category, 757 image samples were collected. RP refers to the cases that the camera is installed or handheld on a patrolling vehicle, which takes images of vehicles with arbitrary viewpoints and distances. The purposes of RP include searching for lost vehicles, security checking in a restricted area, etc. For this application, 611 images were collected. Table I shows that AC is with variables of narrower ranges of variation, whereas RP is with variables of wider ranges of variation. The latter is expected to take a longer time for plate detection as it comes with a larger search space to go through in the plate detection phase. Although LE and RP have a few variables with similar ranges of variation, the most significant difference between them is on the size, pan, and orientation. Since the LE samples were mostly collected from cameras installed on roadsides with a constant distance and viewing angle, the variation on size, pan, and orientation is smaller than those collected by mobile cameras in the RP scenarios.

III. P ROPOSED S OLUTION To verify the proposition that LPR is better tackled in an application-oriented way, a solution with settings that are adjustable for different applications is proposed and tested on the AOLP database. Similar to most LPR solutions, the proposed is also composed of three modules: plate detection, character

554

IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 62, NO. 2, FEBRUARY 2013

segmentation, and recognition. The novelty in each module is highlighted as follows: • Edge clustering for plate detection: Although image edges are a popular feature used for plate detection, this is the first time that they are clustered using the Expectation–Maximization (EM) [16] algorithm with the number of clusters determined by a method proposed in this paper. Experiments show that the method can effectively estimate the number of clusters, facilitating the clustering algorithm. • MSER-based character segmentation: This is the first time that MSER is applied for character segmentation. The MSER detector can efficiently identify local regions almost invariant to illumination variation. The characters on a license plate are often painted with colors of high contrast to the plate backgrounds, making them good targets for the MSER detector. • Classifier with a null class for character recognition: Previous character classifiers only consider alphanumeric characters, but we add in a null class of noncharacters, which can be caused by plate edges, badly segmented characters, and others. This addition is experimentally proven to be effective in improving the character recognition rate. A. Detection of Plate Candidates Edges are one of the popular features used for plate detection [6] as license plates reveal dense sets of edges when processed by an edge detector. Similar to the plate detection in [7], [8], [17], and [18], a Sobel detector for extracting vertical edges is used in our detection module that takes a grayscaled image as input. Our experiments show that vertical edges are better than horizontal edges and even the combination of both in locating the plates. Different from most approaches that extract the edges and process them by heuristic scanning windows or morphological methods [1], [7], [8], [17], [18], the proposed approach applies a clustering method that extracts the regions with dense sets of edges and with shapes similar to plates. An immediate advantage of using clustering is its extensive coverage of different scales, orientations, and rotations in one session. Assuming that the edges distributed as the following Gaussian mixture model (GMM), we apply the EM algorithm for the edge clustering as p(xi ) =

Nc 

wj φ (xi |μj , Vj )

(1)

j=1

where p(xi ) is the probability of the edge point xi , φ is the jth Gaussian weighted by wj and with mean μj and covariance Vj , and Nc is the predetermined number of the clusters. The determination of Nc would be addressed subsequently. Equation (1) c is subject to the constraint that the sum of all weights [wj ]N j=1 must be unity. Given the edge map [xi ], the EM algorithm can be applied to obtain μj and Vj for each Gaussian cluster [16]. The cluster that meets the requirements that its edge density De must be larger than a threshold DT and it has a platelike shape would be considered to be a plate candidate. The plate-shape likeness

can be measured by the following parameters extracted from the cluster covariance Vj : λi, M < sM WI λi, M rλ < < Rλ λi, M   vi, M −1 θm < tan < θM vi, M sm
δ for a preselected threshold δ; otherwise, continue. 6. Increase Nc , and repeat steps 2–5 to generate a new set of clusters. Keep the clusters that satisfy (2)–(4). 7. Output: Plate candidates, which are the clusters invariant across a range of Nc .

B. Character Segmentation Given a plate candidate from the detection module, an MSER detector [14] is exploited to extract possible characters. The MSER was originally proposed for finding correspondences across viewpoints [14], [19]. This is the first time that it is applied for character segmentation. Because of its capability of rendering persistent edges around objects as illumination varies, it was assumed and later experimentally proven to be effective in segmenting the characters that often reveal edges robust to illumination variation. The extraction of MSER considers the set of all possible thresholds that are able to binarize intensity image I(x) into binary image EtM (x) as follows: 1, if I(x) ≤ tM EtM (x) = (6) 0, otherwise where tM is the threshold on intensity. A MSER is a connected region in EtM (x), with little change in its size for a range of thresholds, which is extracted with a watershed-like segmentation algorithm. The number of thresholds that maintain the connected region similar in size is known as the margin of the region. The plate characters are bloblike objects and the MSER usually anchors on the boundaries of such objects. Because the MSER is defined exclusively by the intensity function in the region and its outer border and the local binarization is stable over a large range of thresholds, it possesses the following characteristics. • The region is closed under continuous (and thus projective) transformation, indicating that it is affine invariant, regardless if the image is warped or skewed. This makes it good for locating the characters when the plate is rotated. • The region is closed under monotonic transformation of intensity, reflecting that photometric changes have no effect on it; therefore, it is robust to illumination variation. • The detection can perform across multiple scales without the need of smoothing. Therefore, both fine and large regions can be discovered, which are good for segmentation across sizes of plates. The aforementioned properties are demonstrated in the segmentation results shown in Fig. 4. Although experiments show that MSER works well in capturing all characters under various conditions from the three subsets, a special property attracts our attention. It is highly efficient for the detection of characters with half or fully closed regions, including 0, 2, 3, 4, 5, 6, 8, 9, A, B, C, D, E, F, G, P, Q, R, S, U, V, W, Y, and Z, particularly those with holes in them, such as 0, 4, 6, 8, 9, A, B, D, P, Q, and R. This does not mean that MSER is good only for these characters, it is actually also good for capturing others, but it shows some interesting pattern when capturing these characters. These characters and others can be captured by MSER+, which refers to the region changing from dark to bright across the MSER boundary. However, those with half or full holes are often captured by MSER−, which refers to the region changing from bright to dark across the MSER boundary. MSER− can be easily obtained by running the MSER detection on the given image with intensity inverted. When both MSER+ and MSER− are detected, one can

556

IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 62, NO. 2, FEBRUARY 2013

Fig. 3. Variation of the cost JL as the number of clusters Nc increases. When Nc = 5, the cluster enclosing the plate and other edges is too large to be a candidate. When Nc ≥ 8, JL tends to increase, and the cluster enclosing the plate appears to stay at the same location with similar size. Clusters with yellow contours are rejected by size, with red contours are rejected by shape, and with blue contours are rejected by density. Only the green contours that stay stationary for several (default:3) successive Nc are considered valid candidates. This rule filters out many green contours that only appear once with Nc = 11, 14, 20.

Fig. 4. Regions detected by the MSER detector in three scenarios. Those with green contours are the MSER+ selected by Algorithm 2, with contours are MSER− enclosed by an MSER+, with red contours are rejected by the algorithm, and with yellow bounding boxes are the segmented characters.

double-check the size and orientation of the character, making the segmentation more efficient and precise. Each MSER+ must be verified by some criteria on its aspect ratio and orientation, which are again application oriented. The aspect ratio is given by ra ≡ LM /Lm , where LM is the major axis, and Lm is the minor axis of the smallest ellipse that encloses the MSER+. The orientation is determined solely by  M |. The criteria are as  M and LM = |L the major axis vector L follows.  M < 100◦ , and 0.75Lp < • For AC, ra ∈ [1.6, 2.5], 80◦ < ∠L LM < Lp .  M < 108◦ , and 0.65Lp < • For LE, ra ∈ [1.2, 3.6], 72◦ < ∠L LM < Lp .  M < 120◦ , and 0.5Lp < • For RP, ra ∈ [1.2, 4.2], 60◦ < ∠L LM < Lp . In the above criteria, Lp is the height of the plate candidate. The details of the segmentation algorithm are presented in Algorithm 2. Algorithm 2 combines the MSER with the criteria that the heights, widths, and orientations of the segmented characters and the distance in between must be close to each other, and the number of the segmented characters must be as required. Experiments show that most nonplate regions mistaken as plate candidates in the detection phase, such as cluttered backgrounds and shadows, are removed in this phase. Fig. 4 shows several typical cases from the three subsets, where the regions with green contours are the MSER+ selected by Algorithm 2, with blue contours are the MSER−

enclosed by an MSER+, with red contours are those rejected by Algorithm 2, and with the yellow bounding boxes are the outputs, i.e., the segmented characters. Most of the selected samples are with large orientation and rotation, such as AC-1, LE-1, RP-1, and RP-2, and with different illumination conditions, such as the shadow in AC-2. However, the proposed MSER-based segmentation performs well in all cases. Compared with previous methods, such as the most popular intensity projection [5], [8], [12], [20] and connected components [6], [12] (note that [12] has just recently been published), the MSER performs much better in capturing characters with large orientation and rotation, and its performance is robust against illumination variation. Algorithm 2 MSER-based character segmentation 1. Input: A plate candidate I 2. Binarize I to Et with threshold varying from a minimum tm to maximum tM . 3. MSERs are given by connected regions with marginal size change over thresholds ta –tb , [ta , tb ] ∈ [tm , tM ]; both MSER+ and MSER− are considered valid. 4. Depending on different applications, keep those with  M ), and major axis LM aspect ratio ra , orientation ∠(L within the scopes as specified earlier. m  M and L 5. If an MSER− is enclosed in an MSER+, the L   in the MSER+ are taken as the primary axes LM and Lm . Go to Step 8.  m , and orienM, L 6. Else select the MSER+ with similar L    M } and tation; LM and Lm are given by the means of {L  {Lm } from the selected MSER+. 7. end if  M and L  m.  m in each MSER+ by L  M and L 8. Replace the L   Extend LM and Lm in each MSER+ to the four edge points where the intensity jumps. The region segmented by the rectangle bounding box B formed by the extended  m is considered a character.  M and L L 9. Output: Regions segmented for character recognition.

HSU et al.: APPLICATION-ORIENTED LICENSE PLATE RECOGNITION

C. Character Recognition Each character candidate obtained from the previous segmentation phase comes with an aspect ratio ra and orientation  M . If ra is not in the range [1.6, 2.5] or ∠L  M is not ∠L ◦ ◦ within [80 , 100 ], showing that the character can be with a large viewpoint rotation, an affine transform would be applied  M . The to warp it to within the desired ranges of ra and ∠L character candidate with appropriate aspect ratio and orientation is normalized to 30 × 15 pixels in scale and recognized by a hierarchical classifier with two processing layers. The first layer has 25 classes that cover all 35 alphanumeric characters1 and one class that covers the noncharacters caused by plate edges, badly segmented characters or segments from cluttered backgrounds, giving a total of 26 classes. Out of the 25 alphanumeric classes, seven classes have an enhancement classifier in the second layer that targets a set of characters with high misclassification likelihoods. The seven sets are [B, R, 6, 8], [C, G], [D, 0], [E, F], [I, 1], [U, V], and [Z, 2, 7]. The remaining 18 classes in the first layer each identifies a specific character. The selection of the seven easy-to-misclassify sets is based on the classification results from the experiments that only use the first-layer classifier, which was originally designed to identify the 35 classes of alphanumeric characters. A unique difference of our design from the previous treelike classifiers, such as those in [21] and [22], is the addition of a null class for noncharacters. When a character candidate is classified as a noncharacter, its neighborhood areas would be searched, segmented, and recognized again. This scheme is experimentally proven effective in reducing the missegmentation rate and, in turn, increasing the recognition rate. The local binary pattern (LBP) features [23] are extracted and classified using a linear discriminant analysis (LDA) classifier [24]. Details are given in Algorithm 3. The LBP features are used in the first layer. Because the characters in each easy-tomisclassify set differ from each other locally, only the corresponding distinctive local regions are considered in the second layer. For example, the regions on the bottom right corners are extracted for distinguishing C from G and those on the left are extracted for distinguishing D from 0. Both layers adopt LDA classifiers. Given K classes of LBP features with dimension reduced by principal component analysis (PCA), the LDA classifier searches for the discriminant normal vector w∗ , such that w∗ is perpendicular to the discriminant hyperplane and the ratio of the betweenclass scatter Sb to the within-class scatter Sw is maximized. The desired w∗ happens to be the generalized eigenvector corresponding to the largest eigenvalue when solving the generalized eigenvalue problem Sb w = λSw w. The overall recognition is summarized in Algorithm 3. The reason to choose LDA rather than other supervised learning classifiers, such as neural networks and support vector machines (SVMs), is that it outperforms others handling the cases with a limited number of samples in the training set. For example, given the characters I, L, X, and Y, which are 1 The AOLP database does not have a sufficient number of samples of “O” because the official has stopped issuing plates with “O” for avoid mix-up with the number “0.” We thus consider the remaining 25 letters and ten numbers.

557

with only few samples collected in the AOLP database, our experiment shows that the neural network and the SVM are outperformed by the LDA in the recognition rate. The fact that these classifiers require a sufficient amount of training samples to guarantee their performance is also noted in [1] and [8]. A flowchart given in Fig. 5 summarizes the overall proposed system with three modules, and the input, output, and processing units in each module. Algorithm 3 LDA with LPB for character recognition 1. Input: A segmented character I and a 3 × 3 mask 2. If needed, apply an affine transform to warp I so that its  M ∈ [80◦ , 100◦ ]. ra ∈ [1.6, 2.5] and ∠L 3. Normalize I to 30 × 15, and partition it into 10 × 5 cells. 4. For each pixel x, compute its binary pattern L (gc is its intensity and g0∼7 is the intensity of its eight neighboring pixels) L=

7 

s(gp − gc ) × 2p ,

s(z) =

p=0

1 0

z≥0 z

Suggest Documents