Nonlinear Active Handwriting Models and Their ... - CiteSeerX

0 downloads 0 Views 146KB Size Report
Ψ = 1. M. ∑M k=1 ek. The centralized vector difference from the mean vector is given by Sk = ek − Ψ. The .... Find the active model Γ which is a pre-image in the.
Nonlinear Active Handwriting Models and Their Applications to Handwritten Chinese Radical Recognition G. S. Ng1

D. Shi1

S. R. Gunn2

R. I. Damper2

1

School of Computer Engineering, Nanyang Technological University, Singapore 639798 2

Image, Speech and Intelligent Systems (ISIS) Research Group, Department of Electronics and Computer Science, University of Southampton, Southampton SO17 1BJ, United Kingdom

Email: [asgsng, asdmshi]@ntu.edu.sg; [srg,rid]@ecs.soton.ac.uk

Abstract

to take almost any smooth boundary with few constraints on their overall shapes. The idea of fitting by using image evidence to apply forces to the model and minimising an energy function is effective. The fundamental difference is the amount of prior knowledge which is incorporated. A snake only has generic prior knowledge, such as smoothness. A much greater amount of prior information can be recovered from training sets and encoded within deformable templates.

This paper proposes active handwriting models, in which kernel principal component analysis is applied to capture nonlinear handwriting variations. In the recognition phase, the chamfer distance transform and a dynamic tunnelling algorithm (DTA) are employed to search for the optimal shape parameters. The proposed methodology is successfully applied to a novel radical decomposition approach to the challenging problem of handwritten Chinese character recognition.

Recently, Chung and Ip [2] applied snakes to handwritten Chinese character recognition with energy functional minimisation. The external energy in their work consists of two different functionals, i.e., displacement and intersection functionals. The displacement functional is employed to avoid the snake deviating too much from the original template. The intersection functional is used to avoid the intersection of character strokes and the template. Experiments were conducted on 100 character categories written by 10 people, and the initial results were promising. However, a more appropriate technique should be introduced to deal with false salient features because of broken strokes and thinning algorithms.

1. Introduction Handwritten Chinese character recognition is one of the most difficult pattern recognition problems because it concerns complex structure, serious interconnection among the components, numerous pattern variation, and a large number of characters. The fundamental goals in handwriting recognition are to capture the inter-writer and intra-writer shape variations. There has been a growing interest in deformable models [3], or deformable templates [4], that are able to approximate a target object, by adjustment of their controlling parameters. They are capable of capturing the natural variability within a class of shapes and can be used in object extraction. Deformable templates are closely related to active contour models [5], known as snakes. Snakes are usually free

Cootes et al. [3] proposed active shape models (ASMs) to capture the shape variation and exploit the linear formulation of point distribution models (PDMs) in an iterative search procedure, capable of locating the modelled structures in noisy, cluttered images—even if they are partially occluded. ASMs have been developed and applied to handwriting recognition in our previous work [7, 8, 9]. Their 1

Proceedings of the Seventh International Conference on Document Analysis and Recognition (ICDAR 2003) 0-7695-1960-1/03 $17.00 © 2003 IEEE

performance are proved to be better than snake fitting or stroke analysis. In this paper, non-linear active shape models are developed for handwriting recognition to investigate the nature of the non-linearity. Section 2 introduces relevant concepts and linear active shape modelling. Section 3 describes how active shape models are improved by kernel principal component analysis to capture nonlinear handwriting variations. In Section 4, the use of nonlinear active shape models in handwritten Chinese character recognition is detailed. Conclusions are given in Section 5.

2

Linear Active Handwriting Models

On the basis of active shape models [3], linear active handwriting models capture character variations with principal component analysis (PCA). One motivation of PCA is to reduce the size of the size of optimization space. Models are generated by adjusting a small number of shape parameters to fit a target character[8]. Given a set of examples for a character {e1 , e2 , . . . , eM }, which are represented by N landmark points (manually-determined in this work). Hence: ek = (xk0 , yk0 , . . . , xk(N −1) , yk(N −1) )T The mean M vector of the set is then defined by 1 Ψ= M k=1 ek . The centralized vector difference from the mean vector is given by Sk = ek − Ψ. The centralized character matrix A, is then formed as by [S1 S2 . . . SM ], having 2N rows and M columns. The eigenvectors, ui , andeigenvalues, λi , of the covariance matrix AAT = M 1 T k=1 Sk Sk are solutions to M (AAT )ui = λi ui . The variance explained by each eigenvector is equal to the corresponding eigenvalue. Most of the variation can usually be explained by a small number of modes, M  . Any example in the training set can be approximated using the mean vector and a weighted sum of these deviations obtained from the first M  modes: Γ=Ψ+U·b

(1)

Here U = (u1 , . . . , uM  ) is the matrix of the first M  eigenvectors, and the shape parameters, b = (b1 , . . . , bM  ) control the variation in shape around the mean radical.

3

Nonlinear Active Handwriting Modelling with Kernel PCA

From equation (1), we can see that the linear active handwriting models are only suitable for representing linear vari-

ations within the point distribution models. However, nonlinear shape variations may be common in handwriting, such as different writing styles from person to person, and time-varying distortion. Kernel PCA [6] provides a way to extend linear PCA to nonlinear subspaces of the data. Here, linear PCA is performed in some high-dimensional feature space F which is related to the input space by a nonlinear map Φ. The number of nonlinear components obtained by kernel PCA can be greater than the original input dimension; the dimension size of the feature space is specified by the number of training examples. However, the method will confer no advantage if the data lies in a linear subspace. The main challenge with this method is how to choose an appropriate nonlinear transformation. In the feature space, the covariance matrix takes the following form:

C=

M 1 ˜ ˜ j )T . Φ(ej )Φ(e M j=1

˜ i ) are the centralised point in the feature space Here, Φ(e F corresponding to the example ei , and M  ˜ i ) = Φ(ei ) − 1 Φ(e Φ(ej ). M j=1

(2)

The kth eigenvector Vk of the covariance matrix C and its eigenvalue λk are solutions to λk Vk = CVk . Since all ˜ 1 ), . . . , Φ(e ˜ M ), solutions with λk = 0 lie in the span of Φ(e k there exist coefficients αi (i = 1, . . . , M ) such that: k

V =

M 

˜ i ). αik Φ(e

i=1

The eigenvalue problem is then solved(see [6] for details): ˜ k M λk αk = Kα ˜ is an M × M matrix, and αk is a column vector where K of length M . However, there is no centralised data in F, so it is not possible to compute ˜ i ) · Φ(e ˜ j) ˜ ij = Φ(e K directly but relying on non-centered counterpart K: Kij = Φ(ei ) · Φ(ej ). Define the notations 1ij = 1 for all i, j, and (1M )ij =

Proceedings of the Seventh International Conference on Document Analysis and Recognition (ICDAR 2003) 0-7695-1960-1/03 $17.00 © 2003 IEEE

˜ ij : 1/M to compute K ˜ ij = K =

=

M M 1  1  Φ(em )] · [Φ(xj ) − Φ(en )] M m=1 M n=1 M M 1  1  Kij − 1im Kmj − Kin 1nj M m=1 M n=1 M 1  + 2 1im Kmn 1nj M m,n=1

[Φ(ei ) −

(K − 1M K − K1M + 1M K1M )ij

(3)

For each non-zero λk , the eigenvector expansion coefficients αk are normalised, which leads to the corresponding vectors Vk in F being normalised. The projections of a data point e onto the eigenvectors Vk in F can be defined as: 

T

βk (e) = Vk Φ(e) =

M 



αik Φ(ei )T Φ(e) =

i=1

M 

˜ i , e). αik K(e

i=1

where M  is the number of principal components (modes). Any example in the training set can be approximated using the mean vector and a weighted sum of these deviations obtained from the first M  modes. Our purpose is to build up models for each handwriting class, which require approximate representations of the data in input space rather than in feature space. To this end, by introducing shape parameters, active shape models can also be generated on the basis of kernel PCA with the following two steps: 1. Generate active shape models in feature space with the mean vectors Ψ. An operator PM  ,b is defined by:

+

+

* +++ + + +

* *

+

* *

* * x

Figure 1. Illustration of pre-image in input space.

Figure 1, a point in the feature space will be away from this hyperplane under linear operations (e.g. component truncation and shape parameter). Hence, its pre-image will be approximated by the one corresponding to its nearest hyperplane point in the feature space. A shape model is denoted by Γ(b), which can be generated by changing parameters b. In the matching phase, the shape models for all the classes are superimposed upon the target image respectively. In order to locate the generated models on the target image quickly, a satisfactory image transformation process is required. As chamfer distance transform [1] have the ability to handle noisy and distorted data, as the edge points of one image are transformed by a set of parametric transformations, which describe how the images can be geometrically distorted in relation to one another. The energy is obtained by calculating chamfer distance between each points of the model and a target image [7, 8]. Hence the energy of a shape model Γ is given by:



˜ PM  ,b Φ(Ψ) =

M 

βk (Ψ)bk Vk .

E(b) =

k=1

j=1

2. Find the active model Γ which is a pre-image in the feature space so as to minimise: ρ(Γ)

=



=

˜ K(Γ, Γ) − 2

k=1

˜ +K(Ψ, Ψ)

bk βk (Ψ)

Ichamfer (γ j (b))

(6)

where γ j is the jth point of Γ, which is located in a 2dimensional position (xj , yj ).

2 ˜ ˜ ||PM  ,b Φ(Ψ) − Φ(Γ)|| M 

N 

M 

˜ i , Γ) αik K(e (4)

i=1

(5)

The following description may help in understanding equation (4). All the points in the input space will be mapped to a hyperplane in the feature space. As seen in

To avoid local minima in parameter searching, dynamic tunnelling algorithm (DTA, [12]) is incorporated with gradient descent [8]. The DTA concept is governed by the fact that any particle placed at a small perturbation from the equilibrium point will move away from the current point to another within a finite amount of time. Hence, the dynamic tunnelling procedure can jump to another basin of attraction where the new, initial search point is even lower in energy. Referring to equation (1), an optimization for the shape parameters is given by

Proceedings of the Seventh International Conference on Document Analysis and Recognition (ICDAR 2003) 0-7695-1960-1/03 $17.00 © 2003 IEEE

∂E ∂b

N = i=1 ∂I (x , y ) N ∂b i i = i=1 ∂I (γ2i , γ2i+1 ) N ∂b∂I ∂x ∂I ∂yi = i=1 ( ∂xi ∂bi + ∂y ) i ∂b = 0.

(7)

In the nonlinear case in this paper, the shape parameters are considered in the feature space. However, the parameter vectors are not orthogonal in the input space, and there is no direct gradient descent information. Hence, the application of DTA in the nonlinear case is achieved by multi-point sampling.

4

Experiments on Handwritten Chinese Radical Recognition

In our experiments, the nonlinear active shape modeling was applied to handwritten Chinese character recognition, which is one of the most difficult pattern recognition problems, because it concerns a great number of classes and many variations. Although there are many thousands of Chinese characters, these can be decomposed into a small number of radicals (or sub-characters). Therefore, the complex character recognition problem is converted to a simpler problem of radical extraction and optimization of a combination of the radical sequences. In this section, active handwriting models will be applied to radical extraction for handwritten Chinese characters. After experimentation with polynomial, spline and Gaussian kernels, and the latter was chosen as the most suitable candidate. Figure 2(a) shows an example of character skeleton and its chamfer transformed image. This character consists of two radicals, and Figure 2(b) is the mean model of its lower radical. Figure 2(c) shows active models with the number of principal components changed from 1-5, and the chamfer distance between the models and the input skeleton are 198, 104, 43, 32, 26 respectively. Our experiments were conducted on the HITPU database [10], which was collected by Harbin Institute of Technology and Hong Kong Polytechnic University, and comprises a collection of 3755 categories written by 200 different writers. We are now in a position to compare some representative works on radical extraction with our proposed nonlinear active shape modelling method. Method 1: Nonlinear active shape models with kernel PCA. The experiments are conducted on 98 radicals covering 1400 loosely-constrained Chinese character categories written by 200 different writers (i.e., 280,000 characters). Method 2: Active shape models with linear PCA [7, 8]. The experiments are conducted on the above dataset.

(a)

E(b)=

198

(b)

104

43

32

26

(c) Figure 2. Active model generation by adjusting shape parameters. (a) An example of character skeleton and its chamfer transformed image. (b) Mean models. (c) Active models generated with the number of principal components from 1 to 5.

Method 3: Stroke-based approach [11]. Their experiments for radical extraction were conducted on just 32 radicals and 1856 test characters. Method 4: Snake-fitting approach [2]. Their experiments were conducted on 100 character categories written by 10 people (i.e., 1000 test examples only). They considered and reported results on six most common radical combination schemes, namely, vertical (V), left-down (LD), surrounding (S), horizontal (H), up-left (UL), and cover (C), respectively. From Table 1, we can see that the method of our nonlinear active shape models is easily the best among the existing radical approaches. It deals with the largest number of radicals on a test set which is significantly larger than other workers have used, and still achieves the best correct matching rate. The poorer performance of method 3 is a consequence of ambiguity when strokes intersect. At the point of intersection, it is problematic as to which of the radiating lines should be grouped together so that some strokes may be spurious. With very little prior knowledge, snakes are forced on to the image by smoothness and some salient features, so the method 4 is suffering from false salient features because of broken strokes and thinning algorithms. The advantage of our proposed active models is the capabilities to handle the individual writer variation with only a small number of shape parameters. Our models can also avoid stroke extraction, which is difficult in handwriting recognition as there will be considerable interconnection among the strokes as well as many broken strokes. How-

Proceedings of the Seventh International Conference on Document Analysis and Recognition (ICDAR 2003) 0-7695-1960-1/03 $17.00 © 2003 IEEE

Table 1. Performance comparison of different radical approaches to Chinese character recognition. Test set size (characters) Method 1 (Nonlinear ASM) Method 2 (ASM, linear PCA) Method 3 (stroke-based) Method 4 (snake fitting)

280,000

Number of radicals trained 98

% radicals correct 96.5

280,000

98

94.2

1856

32

92.5

1000

100

89.0 (V) 78.5 (H) 81.0 (LD) 68.0 (UL) 95.0 (S) 75.0 (C)

ever, this method requires longer matching time caused by working at the pixel level and shape-parameter searching.

5

Conclusions

This paper has introduced a novel nonlinear active shape models for handwriting recognition. In training, kernel principal component analysis is employed to capture the main nonlinear variation of the training examples around the mean vector. In matching, the dynamic tunnelling algorithm is employed to search for the optimal shape parameters in terms of chamfer distance minimization. This methodology is applied to handwritten Chinese character recognition, and achieves superior performance compared with existing similar representative works. We can conclude that there is non-linear variation within handwritten Chinese characters and building non-linear active handwriting modelling into the algorithm can improve recognition performance.

[3] T. F. Cootes, C. J. Taylor, D. H. Cooper and J. Garaham, “Active shape models—their training and application”, Computer Vision and Image Understanding, Vol. 61, No. 1, pp. 38-59, 1995. [4] A. K. Jain and Y. Zhong and S. Lakshmanan, “Object Matching Using Deformable Templates”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 18, No. 3, pp. 267-278, 1996. [5] M. Kass, A. Witkin and D. Terzopoulos, “Snakes: Active contour models”, International Journal of Computer Vision, Vol. 1, No. 4, pp. 321-331, 1988. [6] B. Sch¨olkopf, A. J. Smola and K. M¨uller, “Kernel Principal Component Analysis”, B. Sch¨olkopf, C. J. C. Burges and A. J. Smola (eds.), Advances in Kernel Methods, MIT Press, Cambridge, pp. 327-352, 1998. [7] D. Shi, S. R. Gunn and R. I. Damper, “Active radical modeling for handwritten Chinese characters”, Proceedings of Sixth International Conference on Document Analysis and Recognition, Seattle, WA, pp.236-240, 2001. [8] D. Shi, S. R. Gunn and R. I. Damper, “A Radical approach to handwritten Chinese character recognition using active handwriting models”, In:Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Hawaii, 2001. [9] D. Shi, S. R. Gunn and R. I. Damper, “Handwritten Chinese character recognition using nonlinear active shape models and viterbi algorithm”, Pattern Recognition Letters, Vol. 23, No.14, pp.1853-1862, 2002. [10] D. Shi, “An active radical approach to handwritten Chinese character recognition”, PhD thesis, University of Southampton, UK, 2002. [11] A. B. Wang and K. C. Fan, “Optical recognition of handwritten Chinese characters by hierarchical radical matching method”, Pattern Recognition, Vol. 34, No. 1, pp. 15-35, 2001. [12] Y. Yao, “Dynamic Tunneling Algorithm for Global Optimization”, IEEE Transactions on Systems, Man, and Cybernetics, Vol. 19, No. 5, pp.1222-1230, 1989.

References [1] G. Borgefors, “Hierarchical chamfer matching: A parametric edge matching algorithm”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 10, No. 6, pp. 849865, 1988. [2] F. L. Chung and W. W. S. Ip, “Complex Character Decomposition Using Deformable Model”, IEEE Transactions on Systems, Man, and Cybernetics – Part C: Applications and Reviews, Vol.31, No. 1, pp.126-132, 2001.

Proceedings of the Seventh International Conference on Document Analysis and Recognition (ICDAR 2003) 0-7695-1960-1/03 $17.00 © 2003 IEEE