Graphical Models for Joint Segmentation and ... - IEEE Xplore

12 downloads 0 Views 344KB Size Report
gation, character segmentation, graphical models. I. INTRODUCTION ... For ex- ample, the prior knowledge on the characters to be recognized is employed for ...
10

IEEE SIGNAL PROCESSING LETTERS, VOL. 16, NO. 1, JANUARY 2009

Graphical Models for Joint Segmentation and Recognition of License Plate Characters Xin Fan, Member, IEEE, and Guoliang Fan, Senior Member, IEEE

Abstract—We formulate the issue of joint image segmentation and recognition as an integrated statistical inference problem. A two-layer graphical model is proposed that supports the optimal segmentation and recognition in an unified Bayesian framework. Due to the explicit modeling of two tasks in the graphical model, an efficient non-iterative belief propagation algorithm is used for state estimation. The proposed approach is applied to automatic licence plate recognition (ALPR), and it outperforms traditional methods where the two tasks are implemented independently and sequentially.

Fig. 1. Two-layer Markov network to integrate segmentation and recognition.

Index Terms—Automatic license plate recognition, belief propagation, character segmentation, graphical models.

I. INTRODUCTION EGMENTATION and recognition are two important tasks in signal/image processing and computer vision. Traditionally, these two tasks were implemented in a cascade fashion independently and sequentially, where the segmentation module captures low-level features, e.g., color, intensity and edges [1], [2], and groups the features into regions as the inputs of the object recognition module [3]. Recently, there is increasing interest in exploring the interaction between the two tasks. For example, the prior knowledge on the characters to be recognized is employed for segmentation in [4] and the recognition outputs are fed back to the segmentation process [5]. These attempts exploit the prior knowledge or recognition results to validate the segmented candidates through accept/reject decisions or to fine-tune the segmentation. However, these feedback mechanism may face a chicken-and-egg dilemma: a good recognition can facilitate a good segmentation that is also the prerequisite for the good recognition. Therefore, researchers advocate to formulate joint segmentation and recognition in a unified framework, where the two tasks can be optimized simultaneously. A similar idea can be found in [6] and [7] where a segmental hidden Markov model (SHMM) was proposed for speech recognition that involves a

S

Manuscript received July 16, 2008; revised September 29, 2008. Current version published December 12, 2008. This work was supported in part by the National Science Foundation (NSF) under Grant IIS-0347613 and in part by the Army Research Office (ARO) under Grant W911NF-08-1-0293. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Lisimachos Paul Kondi. X. Fan is with the School of Electrical and Computer Engineering, Oklahoma State University, Stillwater, OK 74078 USA, and also with the School of Information Engineering, Dalian Maritime University, Dalian, China (e-mail: [email protected]). G. Fan is with the School of Electrical and Computer Engineering, Oklahoma State University, Stillwater, OK 74078 USA (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/LSP.2008.2008486

segmental observation model. SHMMs can be trained by the expectation maximization (EM) algorithm that has to be approximated due to the segmental modeling. In the vision community, graphical model-based approaches are often used for joint segmentation and recognition problems. A unified segmentation, detection, and recognition paradigm was proposed in [8] that involves parsing graphs to represent the general scene and could be solved by Markov chain Monte Carlo (MCMC) methods that is computationally expensive due to the complicated state space. Also, a tree-structured model was used for object detection and recognition in [9], which encodes topological constraints among parts and integrates detection and recognition in one framework. In this letter, we study joint segmentation and recognition in the context of automatic licence plate recognition (ALPR) that is a specific OCR application and analogous to speech recognition. Inspired by [10], we propose a two-layer Markov network to integrate the two tasks into one statistical framework that involves the explicit modeling of both segmentation and recognition [11]. Both low-level cues and high-level prior knowledge are incorporated in terms of probabilities. Then the problem of joint segmentation and recognition turns out to be the estimation of posterior probabilities of two types of latent variables, which can be achieved by a simple yet efficient belief propagation (BP) algorithm. In the case of ALPR, the proposed approach outperforms the methods where two tasks are performed individually and sequentially. The proposed method can be extended to other OCR applications. II. PROBLEM FORMULATION A. Graphical Modeling of Joint Segmentation and Recognition As shown in Fig. 1, we propose a two-layer Markov network to formulate the joint segmentation and recognition problem in a 1-D case, where three kinds of nodes are defined. The twolayer latent nodes (the white circles) represent the segmentation and recognition variables, i.e., and , respectively, where indicates the location of a segmented object of interest (OOI), and denotes the class label of the segmented OOI. The black circles represent the low-level features, and the gray ones

1070-9908/$25.00 © 2008 IEEE

FAN AND FAN: GRAPHICAL MODELS FOR JOINT SEGMENTATION AND RECOGNITION OF LICENSE PLATE CHARACTERS

that are dummy variables embody the compositional prior information of . OOIs in image , the two sets of latent variables Given and are represented by . The problem is formulated as a maximum a posand that maximize teriori (MAP) problem, i.e., finding and , respectively. posterior probabilities Within the Markov network framework, the joint probability is represented in of latent variables , , and observations terms of dependent functions as

(1)

denotes the set of subscripts of the neighboring where and are the compatible (segmentation) nodes of ; and are the functions between latent variables; and potential functions regarding the observations and prior information about latent variables, respectively. These functions en, as code the bottom-up and top-down information in discussed below. represents how likely that • The potential function the th OOI locates at a certain location given , which captures the low-level features of the OOI. evaluates the proba• The compatible function bility of the th OOI located at to be recognized. These values can be calculated by any recognition algorithms. specifies the spatial • The compatible function distance between neighboring OOIs. describes the probability of the th • The potential OOI with respect to . This potential can be specified or learned a prior according to the compositional semantics. , we can use the BP algorithm to marginalize Given , i.e., to obtain and for joint segmentation and recognition. It is worth noting that the number of the nodes and the potential and compatible functions are specified based on the analysis of the OOIs for an ALPR application in this letter. However, these parameters can be learned automatically given a specific application. The proposed graphical model exhibits the same topological (tree) structure as that in [9] whereas we explicitly model segmentation and recognition variables. We can use the similar maximum likelihood (ML) estimators as [9] for learning parameters from training data. B. Inference Algorithm The BP inference can be used for an acyclic graphical model [10], such as our Markov network. The BP algorithm accumulates local computations to circumvent the global computation, where the key step is to recursively update the local messages between adjacent latent nodes, both within the segmentation layer and across the two layers. Specifically, we have three kinds of message passing, two of which are illustrated in Fig. 2. The message from a recognition node to its associated segmentation node , which conveys the validation information about recognition, is calculated by (2)

11

Fig. 2. Computation of message passing: (a) from a segmentation node to its recognition counterpart and (b) between two adjacent segmentation nodes, where the dashed and solid lines represent the messages to be passed and the available messages, respectively.

to its We compute the message from a segmentation node recognition counterpart , carrying low-level features and object location constraints [as shown in Fig. 2(a)], as (3) where the message passing between adjacent segmentation with [as shown in Fig. 2(b)], is nodes, recursively evaluated by

(4) denotes the subscript of the adjacent (segThe symbol mentation) nodes of excluding . Once all the messages are available, the marginal probability of a node is computed as the “belief” as (5) (6) The algorithm converges when all nodes have been visited. The standard BP algorithm requires the discrete probabilities. In our case, the variables in the recognition layer are discrete, and we also discretize the segmentation variables. III. APPLICATION TO ALPR As a case study, we demonstrate the usefulness of the proposed model in the ALPR application. Specifically, our study is based on the license plate images acquired in a city of China. We assume the position and the size of the license plate in an image is determined a prior by a plate localization algorithm. The license plate images have been rectified by a skew correction algorithm so that the positions of characters can be determined by a 1-D variable, but some distortions and corruptions may be present in input images. An example of the input image is shown in Fig. 3. We employ the constraints of license plate images to define the compatible and potential functions. The following semantic attributes of a license plate are incorporated. • In this study, the license plates have six characters. This attribute specifies the topology of the two-layer Markov network that has six nodes at each layer. • The characters of a license plate have compositional semantics [5] that can be represented in the form of poten-

12

IEEE SIGNAL PROCESSING LETTERS, VOL. 16, NO. 1, JANUARY 2009

It is worth mentioning that the proposed method can be extended to other applications for joint segmentation and recognition. The key is to define appropriate potential and compatible functions by considering various constraints or prior knowledge. The graphical model-based approach offers an integrated framework that can unify a variety of deterministic and statistical factors into a probabilistic form where the efficient BP algorithm can used to obtain the optimal solution to the joint segmentation and recognition task. Fig. 3. Example of (top) input images and (bottom) its projected edge histogram (smoothed by a 1-D Gaussian kernel).

tial functions . For example, in our case, the first character is an alphabetical character and the last three are digits. Then the potential functions of the first node and the last three nodes can be specified as if is a digit otherwise

(7)

if is a digit otherwise

(8)

and

where , 5, 6, or we can count the occurrence fre. quency in a database to estimate • Each character in a license plate image has high contrast. The projected edge histogram is often used for character segmentation [12], [13] that is obtained by detecting edges and counting edge points at each column (Fig. 3) as (9) where is the value of the projected edge histogram at column (or bin) . • The intervals between every two characters are uniform, which can be represented by enforcing a Gaussian density to the distance between adjacent characters (10) is the 1-D distance between two characters where is the mean distance. We set to accomand modate camera perspective distortions. • We consider an intermediate constraint that bridges segmentation and recognition. This constraint indicates the readability of the segmented OOIs. In other words, the recognizer would have a valid output with a segmented OOI is defined as [5]. Thus, the function (11) where is the distance metric used in [14] that measures the similarity between two patches; is a small posand itive value to avoid a division by zero; and are the shape contexts calculated from the edge maps of the character template with respect to and of the segmented OOI specified by .

IV. EXPERIMENTAL RESULTS Our experiments were conducted on 476 license plate images randomly captured in a city of China under uncontrolled circumstances, each of which has six characters (excluding the Chinese character) with total 2856 characters. We performed skew corrections to the extracted license plate images and normalized the image dimension to 236 by 48. The width of characters were set in (10), was to 25 pixels, and the mean character distance, 30 pixels. The DLL implementation of the proposed method and the compared algorithms are available from our webpage.1 We present some segmentation results in Fig. 4. In each figure, the first two rows show the license images and the projected edge histograms, and the third row illustrates the segmentation results in the rectified images where the positions of the segmented characters are indicated by vertical centerlines. The probability distribution of character segmentation is shown in the fourth row. The proposed approach works well even when the images exhibit connected characters [Fig. 4(a)], severe corruption by stains [Fig. 4(a) and (b)], irregular lighting [Fig. 4(a) and (c)], and physical distortion [Fig. 4(d)]. However, if segmentation and recognition are performed separately, the undesired peaks in the projection histogram due to various distortions would frustrate the traditional segmentation algorithms and consequently confuse the recognition module. On the other hand, our approach gives the probabilistic estimation of character segmentation and simultaneously recognizes the segmented OOIs. It is noticed that the lines showing segmentation results in Fig. 4 do not always indicate the exact center of the characters. It is because the recognizer that is insensitive to shift errors can still output valid and correct recognition even with imperfect segmentation. Thus, the validated (or confident) recognition may still accept an inaccurate segmentation that usually does not degrade the overall performance. However, recognition may fail in such rare cases like the second character in Fig. 4(a), i.e., “c”, where the stains notoriously fragment the character. The performance can be improved by incorporating other elegant feature representations than the projection histogram and/or a stronger classifier into the proposed graphical models. Table I compares the proposed algorithm with two other methods. One is the Hausdorff-distance-based algorithm [15] and the other is the shape context-based algorithm [14], for both of which we use the projected edge histogram to generate segmented OOIs [12]. Our approach shows the superior performance that is mainly owing to the better character segmentation results and the incorporation of compositional semantics. For example, our approach always outputs digits for the last three character, but the other two algorithms may mistakenly classify 1http://www.vcipl.okstate.edu/~fanxin/downloads.htm.

FAN AND FAN: GRAPHICAL MODELS FOR JOINT SEGMENTATION AND RECOGNITION OF LICENSE PLATE CHARACTERS

13

by such adaptive sampling algorithms like particle filters in [16] to improve the efficiency. V. CONCLUSION AND FUTURE WORK In this letter, we consider joint segmentation and recognition problem for a specific OCR application, i.e., ALPR. Both the low-level features and high-level knowledge are integrated into a two-layer Markov network where the two tasks are achieved simultaneously as the results of the BP inference. Three further improvements are possible: 1) to incorporate more compositional semantics about characters; 2) to use more advanced classifiers; 3) to invoke more efficient inference algorithms [16]. Moreover, the proposed framework is promising to be extended to more general cases where an unified probabilistic framework can be developed to implement joint segmentation and recognition by incorporating various constraints into a two-layer graphical model. ACKNOWLEDGMENT The authors would like to thank W. Li for providing the license plate data set and plate correction programs. The authors would also like to thank the anonymous reviewers for their valuable comments and suggestions that improved this letter. Fig. 4. License images, the projected edge histograms, the segmentation results represented by the centerlines, and the estimated posterior probabilities under various conditions, i.e., connected characters (a), severe corruption by stains (a) and (b), irregular lighting (a) and (c), and physical distortion (d).

TABLE I RECOGNITION RATE COMPARISON

them as alphabetical letters, especially for those with great similarities, e.g., “B” and “8”. Our algorithm could be further enhanced by incorporating more informative and complete compositional semantics about characters in the licence plates. For example, both the Hausdorff and shape context algorithms cannot discriminate between “O” or zero, as the templates of these two characters are identical. When we create Table I, we consider that both answers are correct for three algorithms. Still our approach can distinguish these two characters if they locate at the first or last three positions. This improvement is not revealed by Table I. However, the discrimination of these two characters at other positions demands more specific conditional dependency among characters, e.g., “the second character must be ’O’ if the third one is an alphabetic letter”, which could be incorporated in the graphical model as an additional constraint. It takes about one minute to simultaneously segment and recognize a plate at a Pentium-IV computer with a 1.7-GHz CPU and 1 G of RAM under the current parameter configuration without programming optimization, i.e., 36 possible values for the recognition variables and taking every two column near the peaks of the projection histograms as the possible values for segmentation variables. A straightforward approach to speed up the process is to reduce the number of possible segmentation positions. Alternatively, one can replace the uniform discretization

REFERENCES [1] R. Casey and E. Lecolinet, “A survey of methods and strategies in character segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 18, no. 7, pp. 690–706, Jul. 1996. [2] X. Du and T. Bui, “A new model for image segmentation,” IEEE Signal Process. Lett., vol. 15, pp. 182–185, 2008. [3] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, 2nd ed. New York: Wiley-Interscience, 2000. [4] X. Jia, X. Wang, W. Li, and H. Wang, “A novel algorithm for character segmentation of degraded license plate based on prior knowledge,” in Proc. IEEE Int. Conf. Automation and Logistics, 2007, pp. 249–253. [5] S. Chang, L. Chen, Y. Chuang, and S. Chen, “Automatic license plate recognition,” IEEE Trans. Intell. Transp. Syst., vol. 5, no. 1, pp. 42–53, Mar. 2004. [6] M. Ostendorf, V. V. Digalakis, and O. A. Kimball, “From HMM’s to segment models: A unified view of stochastic modeling for speech recognition,” IEEE Trans. Speech Audio Process., vol. 4, no. 5, pp. 360–378, Sep. 1996. [7] Y.-S. Yun and Y.-H. Oh, “A segmental-feature HMM for speech pattern modeling,” IEEE Signal Process. Lett., vol. 7, no. 6, pp. 135–137, Jun. 2000. [8] Z. Tu and S.-C. Zhu, “Parsing images into regions, curves, and curve groups,” Int. J. Comput. Vis., vol. 69, no. 2, pp. 223–249, 2006. [9] P. Felzenszwalb and D. Huttenlocher, “Pictorial structures for object recognition,” Int. J. Comput. Vis., vol. 61, no. 1, pp. 55–79, 2005. [10] W. Freeman, E. Pasztor, and O. Carmichael, “Learning low-level vision,” Int. J. Comput. Vis., vol. 20, no. 1, pp. 25–47, 2000. [11] X. Fan, G. Fan, and D. Liang, “Joint segmentation and recognition of license plate characters,” in Proc. Int. Conf. Image Processing, 2007, vol. 4, pp. 353–356. [12] S. Zhang, M. Zhang, and X. Ye, “Car plate character extraction under complicated environment,” in Proc. 2004 IEEE Int. Conf. Systems, Man and Cybernetics, 2004, pp. 4722–4726. [13] S. Nomura, K. Yamanaka, O. Katai, H. Kawakami, and T. Shiose, “A novel adaptive morphological approach for degraded character image segmentation,” Pattern Recognit., vol. 38, no. 11, pp. 1961–1975, 2005. [14] S. Belongie, J. Malik, and J. Puzicha, “Shape matching and object recognition using shape contexts,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 4, pp. 509–522, Apr. 2002. [15] R. Juntanasub and N. Sureerattanan, “Car license plate recognition through Hausdorff distance technique,” in Proc. 17th IEEE Int. Conf. Tools with Artificial Intelligence, 2005. [16] E. Sudderth, A. Ihler, W. Freeman, and A. Willsky, “Nonparametric belief propagation,” in Proc. Int. Conf. Computer Vision and Pattern Recognition (CVPR), 2003, vol. 1, pp. 605–612.