Design of an Automated Data Entry System for Handwritten Forms Lim Woan Ning, Marzuki Khalid* and Rubiyah Yusof Centre for Artificial Intelligence and Robotics (CAIRO) Faculty of Electrical Engineering, Universiti Teknologi Malaysia, Jalan Semarak, 54100 Kuala Lumpur e-mail:
[email protected] (* all correspondence should be sent to) ABSTRACT In this new informative era, data and information is the most important asset to the organizations. A large amount of money and manpower have been spent in data gathering, data entry, and storage every year. In Malaysia, data gathering is still largely done through manually-filled forms. This data is then entered and stored into databases in government and private organisations manually. Such mode of data entry and storage requires a lot of manpower and is time consuming. At the Centre for Artificial Intelligence and Robotics (CAIRO) of Universiti Teknologi Malaysia, research is being carried out to design a system for automated data entry of handwritten-filled forms. The system consists of a high-speed scanner with an auto-feeder and a computer. In the first phase a software is developed that allows the user to design the template of existing forms such that only the regions of interests are captured. The next phase involved the design of a software to capture handwritten characters in the regions of interest through the scanned forms. Image processing techniques are then used to filter and improve the image of the scanned handwritten characters before they are recognized using a neural network algorithm. Once the characters are identified and verified they are automatically stored into a database. This system can be used efficiently in many organizations that involves gathering and processing of a large number of data such as the National Registration Department, Survey Research Malaysia, Kementerian Pendidikan and Lembaga Hasil Dalam Negeri. Keywords Form Processing, Automated Data Entry, Handwritten Character Recognition, Neural Network. INTRODUCTION The Malaysian government is moving towards an egovernment system with the use of sophisticated technology beyond the 21st century. This is evident with the setting up of the government complex of Putra Jaya and the Multimedia Super Corridor. As more and more employees are needed for more demanding jobs, we would need to develop better machines to replace manual labour-intensive jobs. As computer database systems are expanding at an incredibly increasing rate in Malaysia, government agencies and the private sectors would have to find new solutions in data entry. Currently, almost all data entry jobs in the country are being done manually. Such technique is very laborious and slow which is not only counter-productive but also a waste of useful manpower for other more demanding jobs. At the Centre for Artificial Intelligence and Robotics (CAIRO) a system for automatic data entry of manually filled forms of very high volume is currently being developed. The system comprises of a fast scanner with an auto-feeder and a computer. A software with the appropriate algorithms is currently being developed for communication between the scanner and the computer and also for processing the forms. Image processing and neural network techniques are used to identify and recognize handwritten characters in the forms.
Once completed the system would be able to process handwritten-forms at high speed and store the information extracted from the forms automatically into existing databases. Such a system would eliminate the need for manual data entry as is the case currently being used. Hundreds of workers in this labour-intensive manual data entry jobs could be trained for higher valueadded jobs in manufacturing or elsewhere to meet the country’s acute labour shortage (when the economy recovers). Moreover, the system would be cheaper in the long run and more importantly forms would be able to be processed at a much faster rate to meet more demanding e-government style of administration in the next millenium. Handwritten character recognition has been one of the fields of pattern recognition which has received a great deal of interest over the past decades. Nowadays character recognition has been implemented in numerous applications such as address and zip code recognition, signature identification, and forms processing to name a few. With the advent of neural networks, more successful results have been achieved and there has been a lot of interest in applying neural networks for such application. Moreover, several commercial OCR products incorporating neural networks as their core recognition tools are already available. In the system that we designed, we used the selforganizing neural network – Fuzzy Adaptive Resonance Theory Mapping or Fuzzy ARTMAP as the recognition
algorithm. This neural network is relatively a new neural network paradigm which have a number of advantages when compared to the popularly-used conventional back-propagation algorithm. Forms processing can be encountered as a large OCR task which differs from the common OCR applications, for example, the ones that are used in occasional scanning of a small number of individual pieces of text [1]. Form processing involves the recognition of several thousands of documents per day. The large number of documents involved implies that big savings are possible even for recognition rates well below 100%. The main requirements that must be fulfilled in such a system is to have a relatively low failure rate of recognition which does not lead to costly checking and remedying operations, and also a relatively fast processing speed that is equivalent to a trained typist. In many forms processing systems, frame recognition and segmentation techniques are used to extract the characters in the form. However, since forms maybe designed in many different ways, this will cause an increase in the error rate, for example, mis-processing of some characters and also incorrect extraction of data. Thus, in our system, we designed a module that allows the user to re-design existing forms such that only the region of interest are extracted once the form is scanned. The rest of this paper has been organized as follows. The next section presents an overview of the proposed system. We next discuss the software module where the user can re-design existing forms for direct application with the recognition software. The next three sections discuss the algorithms that we used in the software starting with image processing techniques, followed by the feature extraction technique and the Fuzzy ARTMAP neural network. The last section concludes the paper. OVERVIEW OF THE PROPOSED SYSTEM The automated data entry system consists of a highspeed scanner with an auto-feeder, a high-speed computer and software. In recent years, scanner technology has improved tremendously such that the cost has reduced greatly. However, for this system to be efficiently utilized we recommend high-speed scanners that cost in the range of RM30,000 to RM50,000 each. A personal computer can be used, however, a workstation is recommended due to its speed as a number of algorithms are needed to be executed. The processing time for one form depends on the size of information in the form that need to be extracted. If the information that is needed to be extracted from the form the processing speed will be longer and, thus, a fast computer is necessary in order to match the speed of humans. The software is the major component of the system. We design the software into six modules which are as follows:
(a) Template configuration module, (b) Skew detection module, (c) Image processing module, (d) Feature extraction module, (e) Recognition and classification module, and (f) Storage module. Figure 1 shows the processing steps of the automated data entry system. Before the system can be used forms must be designed appropriately such that data be extracted correctly for the purpose of isolated character recognition.
Handwritten filled form
Scanning the form
Feature extraction
Image processing
Template configuration
Skew detection
Verification Character recognition by Neural network
Fig. 1.
Storage in a database
Processing steps of the automated data entry system.
In this system, the forms must be designed following some recommended specifications in order to get optimum results. Each form must consist of two different appearance targets, one at the top right corner, and another at the bottom left corner as shown in Fig. 2.
Indicator 1 (P1)
Indicator 2 (P2)
Fig. 2. Alignment of the scanned form is done by first finding the two targets. This is used to detect the form whether it is correctly positioned in the scanner. If existing forms are available, the form will first be scanned to make a template. Using
the Template Configuration module, the target location, character location, field location and field types can be set. The form field may be designed in either one of the following formats: Name :
or
IMAGE PROCESSING MODULE After the form has been aligned, each character is extracted and image processing techniques are applied. The image processing techniques include smoothing, sharpening, histogram equalizing, thresholding, blobbed analysing, and normalizing of each of the characters. An example of a raw character being extracted is as shown in Fig. 4.
Name :
Copy and paste facilities are available in this module for the forms to be designed quickly. In this system, there is no need to use special inks to print the forms as required by other systems; the only constraint is the color of the boxes which must be as light as possible, probably light gray. The purpose is to reduce the noise when processing the image. Once the form goes into the scanner, the Skew Detection module re-aligns the form such that the boxes are correctly aligned to detect the characters. The rotation and the translation of a form is calculated based on the two-target points which are detected by pattern matching. The calculations of the misalignment of a form is shown graphically as given in Fig. 3. P1 (X1, Y1)
Fig. 4.
Extraction of a raw character from the scanned form.
In order to extract the character without the noise, smoothing and sharpening techniques are required before the thresholding. An example of a smoothing and sharpening filter on a character is shown in Fig. 5. The effect before and after applying the filter on a character “4” is shown in Fig. 6. 1
2
1
0
-1
0
2
4
2
-1
5
-1
1
2
1
0
-1
0
Smoothing
Sharpening
Fig. 5. Example of a smoothing and sharpening filter.
P1’ (X1’, Y1’)
Center dy
Center’ dy’
Thresholding
θ θ’ P2 (X2, Y2)
dx P2’ (X2’, Y2’)
P1 P2 = template form indicator P1’ P2’ = processed form indicator Fig. 3.
Binarized Image With Noise
Original Image
dx’
Calcul ation of the rotation and translation scale of a mis-aligned form.
Original Image
Smoothen Image
Smooth
For rotation the following calculation is needed: θ = dy / d x θ’ = dy’ /dx’ Rotation = θ - θ’ (1) and for translation, it can be calculated in the following way: Center(x,y)=(x1-x2)/2,(y2-y1)/2 Center’(x’,y’)=(x1’-x2’)/2,(y2’-y1’)/2 Translation = Center - Center’
(2)
Fig. 6.
Sharpen
Sharpen Image
Noiseless Binarized Image
Thresholding
Effect of smoothing and sharpening of a handwritten character ‘4’.
The next process before thresholding involves histogram equalization. This is one of the most commonly used contrast manipulation techniques. It is required to expand the contrast of a dimmed image by re-assigning pixel brightness levels. Figure 7 shows an example of an image after histogram equalization is done. The value of each gray level is illustrated by the following equation: M
Vi = ∑
N
∑
x=1 y=1
1
f(x,y)=i
0
otherwise
∀ i = 1..256 (3)
where Vi denotes the number of pixels that have gray value i. M, N denotes the number of pixels in column and row, respectively, and f(x,y) denotes the brightness value of pixels at position (x,y). Histogram
a nonparametric and unsupervised method of automatic threshold selection [3]. Two threshold values are obtained by implementing OTSU thresholding technique from two directions as shown in Fig. 9 and then the average threshold value is calculated. The character image is binarized using this threshold value.
Threshold A
Darkest
Fig. 7.
Threshold B
Brightness
Gray-scaled handwritten character ‘8’ with histogram equalization.
Histogram equalization uniformly redistributes the gray level values of pixels within an image so that the number of pixels at any one gray level is about the same [2]. This equalization makes it possible to see minor variations within the regions that appears nearly uniform in the original image. The equation of the histogram equalization is given as follows: g’(i,j)=(g max-g min)P[g(i,j)]+g min
Scan from right
Scan from left Threshold A + Threshold B 2
(4)
where g’ denotes the histogram-equalized image, and g denotes the original image. The parameters g max and g min are the maximum and minimum gray levels, respectively. P[g(i,j)] is the value of the cumulative distribution function at gray level g(i,j), and P(j) is defined as follows: j
P(j) = 1 ∑ Ni (5) T i=1 where Ni denotes number of pixels in the image with brightness level equal to i, and T is the total number of pixels in the image (or the total area of the histogram). The result of the equalization is shown in Fig. 8.
Fig. 9.
Example of OTSU thresholding technique on histogram to find threshold values.
After the thresholding process, blob analysis is implemented on the binarized character image to reduce noise. The region of interest (ROI) is then normalized to a size of a 32x32 pixel. Normalization
32
Histogram
ROI
32
Fig. 10. Example of blob analysis and normalization on an image.
Darkest
Fig. 8.
Brightness
Gray-scaled handwritten character ‘8’ with histogram after equalization.
After the histogram equalization, the threshold value is obtained by implementing the OTSU technique which is
FEATURE EXTRACTION In character recognition, there is a need to extract the significant features of the character. In this application the feature extraction is done using Kirsch edgedetection technique. The Kirsch edge-detection technique is a simple algorithm which can detect the edges of an image [4]. The simplicity of the process makes processing rather fast. The algorithm is implemented in a mask operation as shown in Fig. 11.
M0
M1
M2
M7
(i,j)
M3
M6
M5
M4
Original binarized image
Fig. 11. Definition of eight neighbors of a pixel (i,j). The operation is based on the non-linear edge enhancement algorithm given as follows: 7
P(i,j) = max k=0
1, max[ | 5Sk - 3T k | ]
Horizontal
max
=
Vertical
max
=
max
=
max
=
(6) RightDiagonal
where, Sk = M k + M k+1 + M k+2
LeftDiagonal
Tk = M k+3 + M k+4 + M k+5 + M k+6 + M k+7 P(i,j) is the gradient of pixel(i,j). The subscripts of M are evaluated as modulo 8. From the algorithm, the four directional feature vectors, horizontal (H), vertical (V), right-diagonal (R) and left-diagona( L) are calculated as follows: P(i,j)H = max( |5S0 -3T0 | , |5S4 -3T 4 | ) P(i,j)V = max( |5S2 -3T2 | , |5S6 -3T 6 | ) P(i,j)R = max( |5S1 -3T 1 | , |5S5 -3T 5 | ) P(i,j)L = max( |5S3 -3T3 | , |5S7 -3T 7 | )
(7) (8) (9) (10)
The calculations can be shown in simple mask operations as in Fig. 12. 5 5 5 -3 0 -3 -3 -3 -3
-3 -3 -3 -3 0 -3 5 5 5
-3 -3 5 -3 0 5 -3 -3 5
(a) -3 5 5 -3 0 5 -3 -3 -3
5 -3 -3 5 0 -3 5 -3 -3 (b)
-3 -3 -3 5 0 -3 5 5 -3
-3 -3 -3 -3 0 5 -3 5 5
(c)
5 5 -3 5 0 -3 -3 -3 -3 (d)
Fig. 12. Kirsch’s four-directional feature masks : (a) Horizontal (b)Vertical (c) Right-diagonal (d) Left-diagonal. The results of the Kirsch edge-detection on the character image is shown in Fig. 13. After the Kirsch edge-detection, the image will be compressed from a 32x32 pixel image to an 8x8 pixel image in order to reduce the input size to the neural network. The compression is done based on the compression formula: 2i
2j
∑ ∑
S(x,y)
Fig. 13.
Results of Kirsch edge-detection on the character H.
After compression, an image will contain 64 pixels, and the value of each pixel will be used as a feature input to the neural network. Kirsch edge-detection will result in 4 images, which are horizontal, vertical, right-diagonal and left-diagonal as shown in Fig. 13. Thus, including the original image, there will be a total of five images which yield to 5x64=320 features. These features are sent to the input of the neural network for recognition. CHARACTER RECOGNITION A self-organizing neural network known as Fuzzy ARTMAP is used for the classification and recognition of the isolated characters. ARTMAP is a class of neural network that performs incremental supervised learning of recognition and multidimensional maps in response to binary input vectors presented arbitrarily [5, 6]. In turn, the Fuzzy ARTMAP extends the ARTMAP by integrating fuzzy logic into the system, which allows it to accept input vector values between 0 and 1. The Fuzzy ARTMAP neural network consists of two fuzzy ART modules (ART a and ART b ) which are linked together via an inter-ART module, Fab as shown in Fig. 14. The Fuzzy ART algorithm is based on the three main equations given as follows:
Tj =
| I ^ Wj | α + | Wj |
Vj =
| I ^ Wj | |I|
(12)
≥ρ
(13)
x=2i-1 y=2j-1
T(i,j) =
(11) 4
where T(i,j) denotes value of target pixel at position (i,j) and S(x,y) denotes value of source pixel at position (x,y).
W J (new) = β ( I ^ W J (old) ) + ( 1 - β ) W J (old) where, α denotes the choice parameter ( α > 0 ) β denotes the learning parameter ( β ∈ [0,1] )
(14)
ρ denotes the vigilance parameter ( ρ ∈ [0,1] ) I denotes the input value W j denotes the weight for node j Tj denotes the choice value Vj denotes the vigourness value W J denotes the weight of the winner node.
Fig. 14.
The Fuzzy ARTMAP neural network architecture.
Tj will be calculated for each node and the node having maximum Tj will be selected as the winner. The vigourness value Vj is checked such that if Vj is less than the vigilance parameter, the next maximum Tj will be selected as the winner, and the process is repeated again. This process will continue until it successfully finds a winner. All the weights connected to the winning node will be updated based on equation (14). For the learning phase, the features extracted will be passed as inputs to ART a and the result is converted to an 8-bit ASCII character which will be passed as the input to ART b . The Fuzzy ARTMAP’s learning algorithm can be described as follows: 1. Apply an input/desired output vector pair (I0 /O0 ) to the F0 a and F0b layer respectively. 2. F1 a and F1 b complement the original input/desired output vector. 3. Using Fuzzy ART algorithm, ART b classifies the desired output vector into category K. 4. Using Fuzzy ART algorithm, ART a classifies the input vector into category J. 5. Inter-ART module tries to make association from ART a category to ART b category. 6. If I0 predicts an incorrect O0 , a match-tracking mechanism triggered. This mechanism increases the ART a vigilance parameter, ρ a, by a minimum value. Goto step 4 to search for another suitable category. 7. Modify links between F2 a and Fab to encode the associations. The recognition algorithm is the same as the learning algorithm except that it does not have the match tracking mechanism and no modification is made to the
weight of the ART modules. When ART a fails to classify the input pattern, the character will be set as unrecognized. CONCLUSION Automated data entry through handwritten-filled forms can be viably applied in many organizations that handles form processing in a large scale for fast and efficient data storage. The trend now is towards the design of better recognition algorithms with higher accuracy. The automated data entry system that we are currently developing uses a relatively new neural network paradigm, the Fuzzy ARTMAP, that has the advantage of incremental learning and fast convergence capability [7, 8]. However, to be successful, it has to be tightly coupled with good image processing and feature extraction techniques. Once the system can be effectively designed and commercialized, labourintensive manual data entry can slowly be eliminated in many organizations in the near future. The use of this system is in line with the government’s policy of using IT and automation. REFERENCES [1] Dorronsoro, J., Fractman, G., Santa Cruz, C., “Large scale neural form recognition,” Industrial Applications of Neural Networks, 1998, pp.354-362. [2] Myler, H.R., Weeks, A.R., Computer Imaging Recipes in C, London : Prentice-Hall, 1993. [3] Otsu, N., “A Threshold Selection Method from GrayLevel Histograms,” IEEE Transactions on Systems, Man, and Cybernetics, Vol.9, No.1, 1979, pp.62-66. [4] Gorman, L.O., Kasturi, R., Document Image Analysis , California : IEEE Computer Society Press, 1995. [5] Carpenter, G.A., Grossberg, S, et al., “A SelfOrganizing Neural Network For Supervised Learning, Recognition, and Prediction,” IEEE Communications Magazine, 1992, pp.38-46. [6] Carpenter, G.A., Grossberg, S, et al., “Fuzzy ARTMAP: A Neural Network Architecture for Incremental Supervised Learning of Analog Multidimensional Map,” IEEE Transactions on Neural Networks, vol.3, 1992, pp.698-712. [7] Tay, Y.H., Handwritten Character Recognition by Fuzzy Artmap Neural Network with Application to Postcode Recognition, Thesis of University Technology Malaysia, 1997. [8] Tay, Y.H., and Khalid, M., Comparison of Fuzzy ARTMAP and MLP Neural Networks in Handwritten Character Recognition, Proceedings of International Federation of Automatic Control (IFAC) Symposium on Artificial Intelligent in Real -Time Control 1997 (AIRTC'97), 1997, pp.363-37.