Correcting Handwriting Baseline Skew using Direct Methods on the Weighted Least Squares S. J.A. Garnés1 –- M.E. Morita2 –- J. Facon2–- R. Sampaio1,2–- F Bortolozzi2–- R. Sabourin3 1UFPR – Universidade Federal do Paraná Curso de Pós-Graduação em Ciências Geodésicas Caixa Postal 19011 - 81531-041 Curitiba-Pr, Brazil Email:
[email protected] 2PUCPR – Pontifícia Universidade Católica do Paraná PPGIA - Programa de Pós-Graduação em Informática Aplicada Rua Imaculada Conceição 1155, Prado Velho, 80215-901 Curitiba-Pr, Brazil Phone/fax: X - 55 - 41 - 330 1669 Emails: {marisa, facon, fborto}@ppgia.pucpr.br 3Laboratoire d’Imagerie, de Vision et d’Intelligence Artificielle (LIVIA) Ets, Ecole de technologie supérieure, Département de génie de la production automatisée 1100, rue Notre Dame Quest, Montréal, Québec, H3C IK3, Canadá Email:
[email protected]
Abstract: An approach to correct the baseline handwritten word skew in the image of bank check dates is presented in this article. The main goal of such approach is to reduce the use of empirical thresholds. The weighted least squares approach is used on the pseudo-convex hull obtained from the mathematical morphology. Key-words: Handwritten words, Pseudo-convex hull, Weighted least squares. Resumo: Neste artigo apresenta-se uma abordagem para corrigir a inclinação da linha de base de palavras manuscritas em imagens de datas de cheques bancários. A principal meta nessa abordagem é reduzir o uso de limiares empíricos. Para isso é usada a abordagem dos mínimos quadrados ponderados sobre o envelope pseudo convexo obtido a partir da morfologia matemática. Palavras-chave: Palavras manuscritas, Envelope pseudo convexo, Mínimos quadrados ponderados.
1. Introduction There are several research works in the area of handwritten word recognition in search of still undiscovered solutions. Such research is not a trivial
pursuit, in view of several factors, among which we may mention the complexity inherent to human handwriting, because of its variability. To reduce that variability, many handwritten word recognition systems correct the handwriting baseline skew, so that the extraction of that line may be performed correctly, minimizing error in the recognition phase, as proposed in El Yacoubi’s work in [YAC96]. Because of the complexity of handwriting, several works demand the use of empirical thresholds, rendering analysis and implementation more complex. It will be shown that it is possible to reduce the use of thresholds in the correction of the handwriting baseline skew by employing a pseudo-convex hull from the mathematical morphology. The state of the art in terms of handwritten word skew correction will be presented in section 2. The morphological pseudo-convex hull technique to reduce the use of heuristics will be presented in section 3. The approach developed to detect and eliminate undesirable minima in the handwriting baseline on the basis of the weighted least squares approach will be presented in section 4. The approach to correct the handwriting baseline skew in bank check images on the basis of pseudo-convex hulls [MOR98a] will be presented in Section 5. Some results achieved will be presented in section 6. 2. State of the Art Only a few authors in the literature described in detail the approach used to correct the handwriting baseline skew. Guerfali, in [GUE93], corrects the baseline skew in two steps. In the first one a gross approximation of the baseline is performed by dividing the image of the word in eight equal and successive regions, and calculating the eight mass centers. In the second phase, a more refined location of the baseline is carried out, on the basis of the local minima, using a retroactive process to search for a satisfactory result. In this case, only the minima in the median region are taken into account. The straight line best fitting to such minima is evaluated through the least squares method. A rotation is then performed using the angle provided by the straight line detected. This procedure is repeated until that angle does not exceed 2 degrees, a limit he considers acceptable. Guillevic mentions in [GUI95] three methods to detect the baseline skew: decomposition of the main axis, followed by the least squares and local minima. The decomposition of the main axis determines the average skew of an image by calculating its main axes [GON87] that are the eigenvectors of the covariance matrix. The least squares method is commonly used in the detection of
the baseline skew to access the straight line through a set of data, using an average of the minima to approach the squares. The local minima are employed to filter the minima obtained through the least squares. The baseline skew is corrected by rotating the image, so that the baseline axis is aligned with the axis of the system coordinate. The procedure involved is shown in Fig. 1.
Fig. 1: Baseline skew correction (extracted from [GUI95]). Guillevic implements this type of correction, but does not employ it, however, since the average skew of the word images obtained was seldom greater than ±5 degrees. Madhvanath in [MAD92] defines a baseline as the reference line through the image, so that the sum of the squares of the distances of each pixel of the word in the image in relation to that line be minimum (inertia axis). The correction is performed by Hook’s transformation [YAC96]. Mohamed in [MOH95] [MOH96] obtains an estimate of the baseline skew angle by evaluating the core of the given image through morphological operations. The elongation angle of the core image is used as an estimate of the skew angle to correct word skew by using a rotation transformation, if that angle is within a reasonable undefined interval. The methods presented so far assume that the baseline is a straight line, meaning that the skew is constant. In the case of isolated words, those methods are acceptable, but not in a sequence of words. The application of those methods in such situations allows skew correction only for some components of the image as a whole [YAC96]. El Yacoubi in [YAC96] developed a technique taking this problem into consideration. He divided the handwriting baseline skew correction in two steps.
In the first step the overall word skew is evaluated by the least squares at the minimum points detected in the inferior external contour of the image, after a series of filtering operations using empirical thresholds. Once the baseline is determined, the correction of the overall skew is carried out using Hook’s transformation. The second step aims at refining the result of the first one. In this step the baseline is considered to be a series of segments linking the minimum points obtained in order, since words in most cases show two or more skews. The correction is applied to each image confined within two minimum points through transformation. Fig. 2 summarizes the handwriting baseline skew correction steps.
Fig. 2: Handwriting baseline skew correction steps (extracted from [YAC96]). (a) Original image, (b) Detection and filtering of the inferior contour local minima, (c) Baseline approximation ,(d) Image with the corrected baseline skew. Many works are found in the literature that deal with detection and correction of the handwriting baseline skew employing several empirical thresholds. With the technique introduced by El Yacoubi many empirical thresholds had to be defined to eliminate the undesirable minima detected in words to correct the handwriting baseline skew. For that purpose he employs distance parameters dmax = 32 and a factor α = 3/5. Other factors, such as height and width, are employed to eliminate minima that are too near. The minima thus filtered are aligned by a linear regression method and filtered again using another factor. Mohamed in [MOH95] uses a horizontal structuring element according to the word length to correct the baseline skew. However, no detail is provided on how this type of evaluation is performed. Coté in [COT97] uses several empirical thresholds to determine word skew in contour projection histograms. The histogram entropy is used as a criterion to grade the skew. Reference lines are defined considering that histograms include 3 picks. Heuristic factors such as maximum frequency fmax and 0.2 x fmax are used. Coté admits that the restrictive aspects of her method reside in the lack of precision of the heuristic factors, that might impair the quality of the skew value.
The work presented by Bozinovic in [BOZ89] detects solely the baselines between the body of the small letters, based on the analysis of the horizontal density histogram and the use of thresholds. Bozinovic presents in his paper a case in which the baseline is hard to find (Fig. 3).
Fig. 3: A hard to find baseline case (extracted from [BOZ89]). On the basis of the papers studied, it was perceived that the use of heuristics is frequent, and that might adversely impact the development of automatic processes regardless of the type of application (dates, addresses, amounts in writing, etc..). The main goal of this work is to reduce the use of thresholds when filtering undesirable minima to correct the handwriting baseline skew. In previous studies [MOR98b] [MOR98c], following the processing philosophy proposed by El Yacoubi, the pseudo-convex hull notion and the least squares technique were employed. The approach in this paper is to enhance the previous approach using the weighted least squares. 3. Morphological pseudo-convex hull The mathematical morphology represents a non-linear branch of image processing techniques. Its basic principle consists in extracting information about the geometry and topology of an unknown set in an image [FAC96]. There are several ways to obtain the convex hull [GON87] [GRA83]. From the morphological standpoint, the binary convex hull of X may be obtained simply through the basic dilation operation dil B (X ) that, according to Serra in [SER82] consists in dilating an X set by the structuring element B, as follows: dil B ( X ) = {x ∈ X : B x ∩ X ≠ 0}
By definition, the X set convex hull corresponds to the smallest convex set containing X. However, for this approach, in the case of handwritten words processing, just the idea of convex hull will be used, since its application considerably reduces the quantity of minima in a word. Therefore, just the pseudo-convex hull, that also reduces the minima in a word, will be used but to a lesser degree than the convex hull, as may be observed in Fig. 4(b), 4(c) and 4(d). In this case, two operations are being employed to obtain the pseudoconvex hull: the dilation and the reconstruction by conditional dilation.
The operation of reconstruction by conditional dilation consists in the binary reconstruction ρ S (Z ) of a binary (finite) set S from Z with the structuring element of unit B, given by: B B ρ s (Z ) = lim dil cS (...dil cS ( Z )) n → ∞ 14 4 24 4 3 n
In the approach developed, the pseudo-convex hull employed is being extracted in two ways. The process utilized to obtain the first pseudo-convex hull in Fig. 4(c) and 5(b) consists in the horizontal dilation of the word in the original image (set X), followed by the vertical conditional dilation, where the basic structuring elements employed were
Bhor = {• • • }and
• . Bver = • •
The second
pseudo-convex hull is obtained by inverting only the structuring elements in the operations described above, as shown in Fig. 4(d) and 5(c). Thus, the pseudo-convex hull operation that is being used in this case may be given by: pseudo _ convex _ hull ( X ) = ρ
B2 (X ) ( dil B1 ( X ))
where B1 and B2 represent, respectively, Bhor and Bver or vice-versa.
Fig. 4: (a) Original image, (b) Convex hull image, (c) and (d) pseudo-convex hull image. In the case of the pseudo-convex hull images in Fig. 4, 20 iterations were utilized, both in the dilation and in the reconstruction by conditional dilation. Since the number of iterations is reduced, the number of minima increases
accordingly, as may be observed in the pseudo-convex hull images in Fig. 5, where 9 iterations were used. This quantity generated the best overall results in the handwriting baseline skew correction. Both pseudo-convex hull images extracted provided similar but not identical aspects, as may be observed in Fig. 5(b) and (c). Therefore, the intersection of those two images, Fig. 5(d), provides a pseudo-convex hull image with more details than the horizontal and/or vertical axis.
Fig. 5: (a) Original image, (b) First pseudo-convex hull, (c) Second pseudoconvex hull, (d) Pseudo-convex hull image intersection. The result of logic operation Xor presented in Fig. 6 shows that the application of the pseudo-convex hull wraps the words, without altering the vertical positioning of the most relevant word minima and maxima originating from the lower and upper baselines. This is very important because if the minima alter their vertical positions, the correction of the word undulation may be adversely affected.
Fig. 6: Logic operation Xor in Fig. 5(a) and 5(d). 4. Detection and elimination of minima The main objective of employing the pseudo-convex hull is to decrease the use of empirical thresholds in developing this approach. This technique is being
used in a way that reduces the minima in a word so that, when filtering undesirable minima, few empirical thresholds will have to be defined. Because of the likely fragmentation of the words, from now on the pseudoconvex hull of parts of a word or of a whole word will be called connected component. Fig. 7(a) shows an example where the connected components minima were detected in the inferior contour of the pseudo-convex hull. In the case of Fig. 7(b), the minima were detected in the original inferior contour of words. The utilization of the pseudo-convex hull leads to the detection and storage of just the most relevant minima, so that the insignificant (desirable or undesirable) minima are eliminated, thereby avoiding the use of heuristics to eliminate the undesirable ones. Furthermore, the regular aspect of the pseudo-convex hull helps detecting minima. Minima are detected from the inferior contour of connected components. The algorithm consists in sliding down the inferior contour until it rises again. When this happens, it means that a minimum was detected in the fall/rise. This process is repeated throughout the inferior contour. Once the minimum points are determined in each connected component using the pseudo-convex hull, such minima must then be adjusted to a straight line. The usual would be to apply the least squares method using the straight line mathematical model y=ax+b, where a and b are parameters, a is the straight line skew and b the intercept, x and y are the coordinates of the word minimum points.
Fig. 7: (a) Detection of minima in connected components employing the inferior pseudo-convex hull contour, (b) Example of detection of minima in words, using their original contour. The criterion of minimizing the sum of the squares of the remainder, vk, must be applied when there are more than two minimum points in the handwriting, i.e., n>2. In mathematical notation, the least squares method is defined by:
n
minimize
∑v k =1
2 k
n
=
∑ (ax
minimize a, b ∈ R
k
+ b − yk )2
k =1
When different weights are applied to the remainder, the weighted least n
squares method is defined by minimize
∑v
2 k
pk , where pk is the weight of the
k =1
corresponding remainder vk. The parameter estimates are obtained, both for the weighted case and for the case in which the weights are considered units. In both cases we perform a QR decomposition of A, so that A=QR, where Q is an orthogonal matrix and R is a right triangular matrix. This approach transform the normal equations AT Ax=AT y into (QR)T(QR)x= ATy and so, into the system RT Rx= AT y, which is simpler and better to solve than the normal equations using Choleski decomposition. In the case of minimization in R2 , a simple trick of reparametrizing the equations transforms the matrix A into a matrix whose columns are orthogonal and then ATA is diagonal, which gives us directly the values of the parameters. So far A has full rank, the minimization problem is strictly convex and then we have unique solution. If rank of A is not complete we have infinitely many solutions, and the minimal norm solution can be obtained using the singular value decomposition to solve the normal equations. The undesirable minima detection methodology proposed in this paper is known in Geodesy as the “Danish Method” [MIT86] or “Changing Weights” [LEI95]. It consists in decreasing, at each iteration, the weight of the points the remainder of which surpasses a given pre-fixed iteration value. One of the weighting function methods [LEI95] is: p e − (|vk | / 3.σ ) , if | v k |≥ 3.σ pk+1 = pk k pk
, if | v k |< 3.σ
n
where σ is the standard deviation estimated by σ =
∑
p k .v k2
k =1
n− 2
.
As a rule, the first iteration is performed without weights. In our event, in which handwritten words are processed, a different approach was adopted to increase the efficiency of the method. Such approach consists initially in defining and employing weights since the first iteration. The criterion adopted to define the weights pk, with k=1,...,n , is : pk = 1/dk2 where dk = δ with k=1,...,n k/min(δ k), 2 2 and δ = (y -y ) +(y -y k k k-1 k k+1) ; Then the weight of the remainders surpassing the standard deviation value σ is decreased for the first two iterations. For the ensuing iterations, the criterion adopted is to decrease the weight of the remainders surpassing 3σ. Furthermore,
better results were achieved using 2σ instead of 3σ in the weighting function exponential denominator. 5. Correction of the handwriting baseline skew Once the straight line is adjusted by the weighted least squares methods, the global word skew is evaluated and corrected, by rotating the original image utilizing the one-pass implementation [DAN92]. That aims to decrease the losses originating mainly from the pixel fractional addresses, avoiding gaps in the target image. If necessary, the minima detected in the month of the date are filtered to eliminate the undesirable minima, again utilizing the weighted least squares approach. In the case of figures (day/year), the lowest minimum is preserved. The use of Hook’s transformation [YAC96] to correct the skew of each connected set was eliminated, as opposed to the previous studies [MOR98a] [MOR98b] [MOR98c]. The correction of the undulation of each connected component is now carried out, if needed, by positioning all the minima belonging to the connected component in the same vertical position of its first minimum, employing El Yacoubi’s algorithm [YAC96]. Once the undulation is corrected, the connected components are vertically repositioned so as to be aligned with the first connected component. To ensure a good correction of the undulation using this algorithm, the elimination of the undesirable minima is fundamental.
Fig. 8: (a) Original image, (b) Pseudo-convex hull image, (c) Connection of the pre-filtered minima segments, (d) Image with the corrected skew.
6. Results Achieved The tests were applied to 713 images of Brazilian bank checks; 81% were correctly processed (as opposed to 70% in [MOR98c]). In Fig 8 interesting results in a complex image may be observed. In Fig. 9 the enhancement achieved by the use of the weighted least squares instead of least squares is seen. In the word “junho”, where few minima are detected by the use of the pseudo-convex hull, only the Weighted Least Squares technique allowed the elimination of the undesirable minimum in “j”. Because of word fragmentation, unsolved problems remain, like the one in Fig. 10, where the minimum of the fragmented component “ja” was eliminated, thus adversely affecting the correction. It may be seen that the elimination of only the really undesirable minima is fundamental, so as not to impair the handwriting baseline skew correction.
Fig. 9: (a) Original image, (b) Pseudo-convex hull image (c) Miscorrected image with the use of the approach in [MOR98c], (d) Accurately corrected image. This study is part of the integrated research project on the feasibility study of a bank document image processing and analysis system funded by CNPQ (National Scientific Research Council), number 520324/96-0.
7. Conclusion The main goal of this approach was to demonstrate that it is possible to reduce the utilization of empirical thresholds in handwriting baseline skew correction. In this article it was shown that, by associating binary mathematical morphology techniques for the determination of the pseudo-convex hull to the weighted least squares method, the reduction of the empirical thresholds was made possible. In the case of handwritten words, though, the complete elimination of heuristics is still very difficult because of the complexity of handwriting. The utilization of the weighted least squares method allowed the reduction of such problems, leading to better results. Nevertheless, because of word fragmentation and the sometimes small number of minima extracted by the pseudo-convex hull, some problems in the elimination of undesirable minima occur, impairing the
handwriting baseline skew correction. The association of this approach to classical techniques such as projection profiles, etc... is envisaged for future works.
Fig. 10: (a) Original image, (b) Pseudo-convex hull image, (c) Pre-filtered minima (d) Inaccurately corrected handwriting baseline skew. 7. References [BOZ89] Bozinovic R. M., Srihari S. N., Off-Line Cursive Script Word Recognition, IEEE Transactions on Pattern Recognition and Machine Intelligence, vol. 11, no. 1, Jan. 1989. [COT97] Coté M., Utilisation d’un modèle d’accés lexical et de concepts perceptifs pour la reconnaissance d ‘images de mots cursifs, PhD Thesis, École Nationale Supérieure des Telécommunications, France, Jun. 1997. [DAN92] Danielsson P. E., Hammerin M., High-Accuraty Rotation of Images, Graphicals Models and Image Processing, vol. 54, no. 4, pp-340-344, Jul. 1992. [FAC96] Facon J., Mathematical Morphology: Theory and Examples, Gráfica Champagnat, Curitiba, Brazil, Oct. 1996. [GON87] Gonzalez R. C., Wintz P., Digital Image Processing, Addison-Wesley, 1987. [GRA83] Graham R. L., Yao F. F., Finding the Convex Hull of a Simple Polygon, J. Algorithms, 4:324-331, 1983. [GUE93] Guerfali W., Plamondon R., Normalization and Restoring On-Line Handwriting, Pattern Recognition, vol. 26, no. 3, pp. 419-431, 1993. [GUI95] Guillevic D., Unconstrained Handwriting Recognition Applied to the Processing of Bank Cheques, PhD Thesis, Concordia University, Montreal, Quebec, Canada, Sep. 1995.
[LEI95] Leick, A., GPS satellite surveying. 2 edition. New York: John Wiley & Sons, 1995. [MAD92] Madhvanath S., Govindaraju V., Using Holistic Features in Handwritten Word Recognition, United States Postal Services (USPS), vol. 1, pp. 183-198, 1992. [MIT86] Mitishita, E. A., Detecção de erros grosseiros em pontos de controle planialtimétricos para aerotriangulação. Revista Brasileira de Cartografia, no. 40, Jul., pp. 27-35, 1986. [MOH95] Mohamed M. A., Handwritten Word Recognition using Generalized Hidden Markov Models, PhD Thesis, University of Missouri-Columbia, May 1995. [MOH96] Mohamed M. A., Gader P., Handwritten Word Recognition using Segmentation-Free Hidden Markov Modeling and Segmentation-Based Dynamic Programming Techniques, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 18, no. 5, May 1996. [MOR98a] Morita M., Study to enhance the baseline skew correction of handwritten words, Master Thesis, Cefet-Pr, Brazil, Mar. 1998. [MOR98b] Morita M., Facon J., Sabourin R., Bortolozzi F., Approche Morphologique de Correction de l´Inclinaison des mots dans l´Écriture Manuscrite, I CIFED, Conférence Internationale Francophone de l´Écrit et du Document, Québec, Canada, May 1998. [MOR98c] Morita M., Bortolozzi F., Facon J., Sabourin R., Morphological approach of handwritten word skew correction, X SIBGRAPI’98, International Symposium on Computer Graphics, Image Processing and Vision, Rio de Janeiro, Brazil, Oct. 1998. [SER82] Serra J., Image Analysis and Mathematical Morphology, Academic Press, London, 1982. [YAC96] El Yacoubi A., Modélisation Markovienne de L’écriture Manuscrite Application à la Reconnaissance des Adresses Postales, PhD Thesis, Université de Rennes 1, France, Sep. 1996.