IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 29, NO. 2, MAY 1999
205
Automatic Data Capture for Geographic Information Systems Liang-Hua Chen, Hong-Yuan Liao, Jiing-Yuh Wang, and Kuo-Chin Fan, Member, IEEE
Abstract—This paper presents a map interpretation system for automatic extraction of high level information from the scanned images of Chinese land register maps. Our map interpretation system consists of three main components: text/graphics separation, parcel extraction, and rotated character recognition. Our approach to text/graphics separation is based on a simple yet effective rule: the feature points of characters are more compact than those of graphics. In the parcel extraction process, the proposed algorithm traces the branches between feature points to extract polygon structure from line drawings. Our character recognition method is based on the matching of extracted strokes using a neural network. The techniques of text/graphics separation and character recognition are robust to the rotation and writing style of character. Another advantage of our separation algorithm is that it can successfully extract a character connected to graphical line. Experimental results have shown that the proposed system is effective for the data capture of geographic information systems. Index Terms—Character recognition, map interpretation, pattern recognition, text/graphics separation.
I. INTRODUCTION
R
ECENTLY, much attention has been devoted to the geographic information system (GIS). GIS is a computer system that allow users to collect, manage, and analyze large volumes of spatial data. The GIS has many applications in every walk of our lives such as transportation, urban planning, and public facilities management. The first step in the building of a GIS is the input of data. Up to now, most spatial data, like maps, were mainly drawn by hand on paper. Thus, maps stored in paper form need to be converted into digital form for integration into geographic databases. Manual conversion of maps by an operator using a digitizing tablet, however, is time-consuming, costly, and error-prone. The development of a more efficient map input system is, therefore, of substantial importance. One solution is to use a scanner as input device and to apply image analysis techniques to automatically convert a paper-based map into atomic objects that can be interpreted and processed by a computer. Manuscript received June 9, 1996; revised May 30, 1998. L.-H. Chen is with the Department of Computer Science and Information Engineering, Fu Jen University, Hsinchuang, Taipei, Taiwan, R.O.C. (e-mail:
[email protected]). H.-Y. Liao is with the Institute of Information Science, Academia Sinica, Taipei, Taiwan, R.O.C. J.-Y. Wang is with Alcatel Telecommunication, Tuchen, Taipei, Taiwan, R.O.C. K.-C. Fan is with the Institute of Computer Science and Information Engineering, National Central University, Chung-Li, Taiwan, R.O.C. Publisher Item Identifier S 1094-6977(99)02762-5.
The objective of an automatic map interpretation system is to generate a symbolic description for all map entities and their spatial relationships, which is then stored in a database. Several interpretation systems have been developed for solving the task of automatic map input [1]–[9]. However, most of them only focus on graphical lines vectorization and symbols recognition. Few systems can achieve the recognition of rotated characters on a map to extract attribute information such as name of street. Besides, they utilize domain knowledge (such as legend and drawing rules of maps) to drive the interpretation process. This makes these techniques only applicable to some special types of maps. Because the maps of each country have their own distinctive features, the current automatic map input techniques cannot be applied to Chinese maps directly. In this paper, we present an interpretation system for Chinese cadastral maps (i.e., Chinese land register maps). Our system has three components: text/graphics separation, parcel extraction, and rotated character recognition. Although the step of parcel extraction is driven by the semantics of our cadastral maps, the algorithms of text/graphics separation and rotated character recognition are quite general. Both techniques can be extended to other document interpretation problems. In next section, a brief description of a Chinese cadastral map is given and some relevant issues of map interpretation problem are discussed. The three components of our map interpretation system are described in Sections III–V. The performance evaluation of our system is presented in Section VI. Finally, some remarks on our approach are given in Section VII. II. BACKGROUND Fig. 1 shows an example of our cadastral map (or called land register map). It is a monochromatic line drawing of scale 1 : 1000. Cadastral maps describe the geometry of land properties within a geographical context. They divide land territory into a number of polygons, each one representing a piece of land. In cartography, each such polygon is called a parcel. As shown in Fig. 1, each parcel is owned by one legal person and identified with a unique parcel number. On the other hand, each parcel is also labeled with a Chinese character to indicate the usage of land such as building, farm, forest, or dry land. To extract complete land registry information from the scanned image of map, the output of our system is a set of parcel descriptions including polygon coordinates, parcel number, and usage. Several problems often occur during the process of map interpretation. In a map, characters are usually mixed with graphics. Therefore, characters must be separated from graph-
1094–6977/99$10.00 1999 IEEE
206
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 29, NO. 2, MAY 1999
Fig. 1. Chinese cadastral map. Fig. 2. Point
ical lines before characters can be recognized. Most of the previous text/graphics separation algorithms are based on the classification of connected components in the image [10], [11]. However, this approach fails when characters touch or cross graphical lines. To our knowledge, no algorithm has been proposed to extract Chinese characters connected to graphics. Furthermore, the characters on a map may be written in any style and appear in any orientation. To make this situation even more complicated, these characters are often isolated and offer no contextual information for recognition. Since the current Chinese optical character recognition (OCR) techniques only allow the target character to be rotated slightly (perhaps because the number of character categories is large), we need to develop a new method to recognize Chinese characters rotated at large angle (but less than 90 ). To overcome the above difficulties, in our system, we propose a robust technique to extract and recognize the Chinese and numeral characters on map. Our text/graphics separation algorithm is based on clustering the feature points in an image. Using this algorithm, we can separate characters from graphics without regard to the writing style and orientation of character, and can solve the character touching/crossing line problem. Our approach to character recognition is based on the matching of extracted strokes using neural networks. Our OCR technique is also invariant to the rotation and writing style of character. To extract parcels structure and their attributes from map, our map interpretation system is made up of three main components: 1) text/graphics separation; 2) parcel extraction; 3) rotated character recognition. Each of these components is described in the following sections. III. TEXT/GRAPHICS SEPARATION Separating characters from graphics is the first step toward automatic map interpretation. Our approach is based on clustering the feature points in an image. Since the feature point is the protagonist of our text/graphics separation algorithm,
P
with different crossing number.
we define the feature point as follows. The crossing number ) of a bilevel pixel is defined as the number of 0 (white) ( to 1 (black) transitions when ’s 8-connected neighbors are traced once in the counterclockwise direction, i.e., the number of black pixels adjacent to . The definition of feature point in our approach is the point with crossing numbers 1, 3, or 4. As shown in Fig. 2, there are three kinds of feature points: ), 3-fork junction point ( ), terminal point ( ). and 4-fork junction point ( A. Preprocessing The objective of preprocessing is to accurately locate feature points of a scanned image for subsequent processing. The functional steps of preprocessing include: binarization, thinning, and feature points extraction. Since the map is scanned as 256 gray levels image, we must translate it into the bilevel (i.e., black and white) format. Then, the objects regions can be separated from background. Binarization achieves this task by selecting a gray level threshold, and classifying each pixel as belonging to the foreground or the background according to whether its gray level is greater or lower than the threshold. Besides, the scanned image contains specks of noise caused by poor quality of the paper. These noise must be removed to facilitate the automatic analysis process. We filter out all specks smaller than a given size in a preliminary pass. To reduce the amount of information required to represent the contents of the original drawings, a thinning algorithm is applied to the binary image to obtain the skeleton of line drawings. The skeleton of a line pattern is a pixel distribution that is one pixel in thickness and lies wholly within the original line. Thinning is done by iteratively removing edge pixels that can be deleted without destroying connectivity. In our work, we adopt the thinning algorithm proposed by Zhang and Suen [12]. To extract feature points, we inspect the connectivity of 3 neighborhood. If each point of the skeleton around a 3 the crossing number of a point is 1 or 3 or 4, then this point is
CHEN et al.: AUTOMATIC DATA CAPTURE
207
Fig. 3. If the cluster diameter threshold is set too large, points A and B will be misclassified into character clusters.
identified as a feature point. As shown in Fig. 2, the detected feature point is either a junction point or a terminal point. Using graph notation, we may represent the line drawings as a set of nodes (feature points) connected by edges (line segments). Therefore, feature points play important roles in the representation of the topological structure of original drawings. B. Feature Points Clustering To separate Chinese characters from graphics, we use one key property common to characters and graphics in line drawings (maps). It is observed that the topological structure of a Chinese character is more complicated than that of a graphical primitive. Thus, each character should contain more feature points than a graphical primitive does. In our separation algorithm, all detected feature points are classified into several data clusters. Then, each cluster is categorized as either character cluster or graphical cluster depending on its density (i.e., the number of data points in each cluster). Our data clustering algorithm is based on the maximindistance algorithm [13], but we make some modifications on the initial step and stop conditions. Now, we describe this algorithm as follows. Step 1: By exhaustive searching, select the farthest pair as two initial cluster of feature points centers. Step 2: For each of the remaining feature points, a) compute its distance to all the existing cluster centers; b) select the one with minimum distance. with the maximum of Step 3: Select the feature point these minimum distances. , then becomes a Step 4: If maximum is new cluster center and go to Step 2, where the threshold for cluster diameter. Step 5: Stop the creation of new cluster center and assign the remaining feature points to their nearest cluster. To obtain a better clustering result, we need to properly set the threshold for the cluster diameter. If the cluster diameter is set too large, some feature points of graphical primitive are misclassified as character cluster (see Fig. 3). In our setting, the cluster diameter is set to be the height of Chinese character. C. Character Extraction After the feature points clustering process is performed, a set of data clusters is obtained. Next, these clusters are divided into two categories. Our separation principle is that a character cluster is more compact than a graphical cluster. Therefore, if
Fig. 4. In the case of “7” crossing graphical line, it results in four feature points, where each circle represents a feature point.
the number of elements in a cluster is more than the cluster density threshold, then this cluster is identified as a character cluster; otherwise it is a graphical cluster. In our work, the cluster density threshold is set to two. By visiting the feature points of the same character cluster, we could trace the thinned image to get the skeleton of an isolated character. While the skeletons of characters are sent to an OCR module, the graphical clusters still need some further processing (see next subsection). It is noted that the density of a feature point cluster is a rotation-invariant feature for character and graphical primitive. A character written in different styles still results in the same density of feature points for each writing. Thus, our algorithm is independent on the orientation and writing style of character. Since our separation algorithm is based on the clustering of feature points, no connected component analysis is involved in this stage. Therefore, the character touching/crossing line problem can be solved easily. It seems that our algorithm cannot extract numeral characters, because the number of feature points of these simple characters are less than or equal to the cluster density threshold. However, in many cases of numeral character touching/crossing line, they will result in enough feature points to be identified as character cluster (see Fig. 4). After all Chinese characters and some numeral characters (touching/crossing lines) are extracted, to identify the remaining numeral characters (i.e., the numeral characters that do not touch or cross lines), a connected component tracing is performed on the thinned image. A component is identified as numeral character component only if 1) the length-to-width ratio of its circumscribing rectangle (i.e., minimum bounding rectangle) is less than two; 2) the area of its circumscribing rectangle is less than a threshold ( 400, in our setting); 3) there are only 0, 1, or 2 feature points in this component. D. Graphical Line Tracing By the location information of extracted characters, we can trace the thinned image to get the skeleton of graphical lines. Next, we fill in the gaps caused by the character crossing line problem (see Fig. 5). Two line segments will be linked only if the following conditions are satisfied: 1) The width of a gap is less than a threshold value. 2) The slopes of two line segments are almost the same. 3) If a line segment has more than one linking candidates, only the nearest one is chosen.
208
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 29, NO. 2, MAY 1999
Fig. 5. Gap resulting from a character crossing line problem.
d > d2 > d3, gap EG will be erroneously filled.
Fig. 6. Because 1
Unfortunately, the above rules cannoy be applied to the case of two nearby parallel lines. As shown in Fig. 6, the width of is longer than that of gap (since line segment gap is crossed by several characters and line segment is crossed by only one character). Because all four line segments have the same slope and the distance between and is the will be erroneously filled. Therefore, we shortest, the gap update rule (3) by (3) : (3) If a line segment has more than one linking candidate, the one resulting in the smoothest link is chosen. As the case shown in Fig. 6, we compute the angles between and vector , vector and vector , and vector and vector . Since the angle between vector vector and vector is the smallest, we link with to form a smooth line segment. Finally, the section ends with a demonstration of the performance of the proposed algorithm. This example shows that our algorithm works well even if the character crossing line problem occurs. Fig. 7(a) shows portion of a cadastral map image. The skeleton of original image is shown in Fig. 7(b). The extracted feature points are shown in Fig. 7(c). The graphical image after characters extraction is shown in Fig. 7(d). Fig. 7(e) shows the resultant image after all gaps are filled. In this example, our algorithm can also extract numeral character strings successfully. These numeral characters are very close, and therefore are identified as single character with complicated pattern. Since the length-to-width ratio of extracted pattern is large, we can identify it as character string rather than single character. These extracted numeral character strings can be segmented into individual characters by method of connected component analysis. IV. PARCEL EXTRACTION In this section, we propose an algorithm to extract polygons (parcels) embedded in the line drawing of a cadastral map. Currently, we only process parcels which are completely in the
field of view. On the border of map, the drawings only cover certain parcels partially. Our algorithm does not extract such incomplete parcels and assume each parcel area is bounded by a polygon completely. After the graphical line tracing stage of the text/graphics separation algorithm, we obtain two databases: feature point table and branch table. The coordinates and attribute (crossing number) of each feature point are recorded in the feature point table. A branch is a curve between two feature points. Two feature points of a branch are called the starting feature point and the ending feature point which is based on the tracing order. The branch table describes the coordinates of starting (and ending) feature points, length, and chain code [14] of each branch. The input of our parcel extraction algorithm are the feature point table and branch table. The output of parcel extraction algorithm is a polygon table. Each entity of polygon table is a link list of branches. Fig. 8 illustrates the line drawing of a cadastral map. The corresponding feature point table, branch table, and polygon table are shown in Tables I–III, respectively. In each link list of branches, “ ” means positive direction and “ ” means negative direction. It is noted that branch four is a self-loop. To show the validity of the proposed parcel extraction algorithm, we enumerate the following property of line drawings. Property: The line drawing does not contain an isolated self-loop and any branch is shared by two polygons with different direction. For the sake of unique representation, we trace and extract all polygons in the counterclockwise direction. With this constraint incorporated, two adjacent polygons must share at least one common branch with different direction. Referring to between two counter-clockwise Fig. 9, there is a branch and , where vector is a branch of polygon polygons and vector is a branch of polygon . By this property, if we add all the entities of polygon table together, the result will be zero. Referring to Table III, we have . Next, we describe the proposed parcel (polygon) extraction algorithm as follows. Step 1: Select the first untraced branch from branch table. Denote this branch as current branch and add it to the polygon table. Use the starting feature point , of the current branch as the starting vertex, for the first polygon to be identified. Likewise, use the ending feature point of the same branch as the determining point . , select a new Step 2: Among the branches containing branch which has minimum positive angle with respect to the current branch (see Fig. 10). Add the new searched branch to polygon table and record that the branch is traced in positive or negative direction. Step 3: Denote the new searched branch as current branch. Use the ending feature point of current branch as . the next determining point, is equal to then a polygon is identified Step 4: If completely and go to Step 5; otherwise, go to Step 2.
CHEN et al.: AUTOMATIC DATA CAPTURE
209
(a)
(b)
(c)
(d)
(e) Fig. 7. Experimental result of text/graphics separation: (a) original image; (b) skeleton image; (c) detected feature points; (d) separated graphical image; (e) final image after gaps are filled.
TABLE I FEATURE POINT TABLE
Fig. 8. Line drawing of a cadastral map.
Step 5: Check if all branches in the branch table are traced twice. If so, terminate the algorithm; otherwise, go to Step 1 to identify another polygon. Because of the inherent nature of the algorithm, if there polygons embedded in the line drawing, there will are polygons extracted with true polygons and one be pseudopolygon which is the outer contour of the drawing.
210
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 29, NO. 2, MAY 1999
TABLE II BRANCH TABLE
TABLE III POLYGON TABLE
Fig. 11. Map image after characters are extracted. Fig. 9. Illustration of line drawing property.
Fig. 10.
set of features from each stroke of a character. For all model characters, these features are fed into the columns of a two-dimensional (2-D) Hopfield neural network, and for an unknown target character, its features are fed into the rows. Finally, through an optimization process executed in the neural network, a quantitative similarity measure between the unknown target character and each of the model characters is derived. The details of our approach is described in the following subsections.
Branch searching strategy.
A. Stroke Extraction Fig. 11 shows the map image after characters are extracted, and Table IV is the corresponding branch table. The coordinates of feature points are also shown in Fig. 11. The experimental result of parcel extraction is shown in Table V. V. ROTATED CHARACTER RECOGNITION The recognition of rotated characters on map is a difficult task; only a few papers deal with this topic [15]. There are two kinds of characters on our maps: numeral and Chinese characters. Our approach to recognizing numeral characters has been described in another paper [16]. In this paper, we propose a neural network approach to recognizing the general handwritten Chinese characters. The input of our character recognition system is the skeleton of isolated character provided by text/graphics separation module. In our approach, we first extract strokes of a character. Then, we derive a
The purpose of this stage is to extract the set of constituent strokes from a Chinese character. Our approach to stroke extraction is based on the work of Liao and Huang [17]. Each of the extracted strokes is labeled with a number in top to bottom, left to right order. An example demonstrating the result after performing the stroke extraction process together with the labeled strokes is shown in Fig. 12. B. Two-Dimensional Hopfield Networks In this section, the basic concept of the continuous Hopfield network model for matching is introduced. A Hopfield net is composed of numerous highly parallel computing units [18]. It is built from a single layer of neurons, with feedback connections from each unit to every other unit (but not to itself). The weights on these connections are constrained to be symmetrical. Generally, we first characterize a problem to be
CHEN et al.: AUTOMATIC DATA CAPTURE
211
TABLE IV BRANCH TABLE OF FIG. 11
Fig. 12. Each extracted stroke is labeled with a number.
be characterized as minimizing the following energy function:
(1) is the strength of interconnection between a where and a neuron in row column neuron in row column , and is a state variable which converges to 1.0 if the th node (stroke) in the unknown target character matches the th node (stroke) in the model character; otherwise, it converges to 0.0. The first term in (1) is a compatibility constraint. The second and the third terms are the uniqueness constraints. For more details about the general Hopfield net based matching algorithm, please see the work of Lin et al. [19]. TABLE V POLYGON TABLE OF FIG. 11
C. Matching Feature Selection
solved by an energy function . Through a constraint satisfaction process performed in the Hopfield net, the matching result is ultimately reflected in the states of the neurons. In general, the rows of net are arranged to represent the features of an unknown target character, and the columns represent the features of a model character. The state of a neuron reflects the degree of similarity between two nodes (strokes), one from the unknown target character and the other from the model character. The matching process can
For each extracted stroke, we select two features for character matching. One is a local feature which is the length of the stroke. The other is a relational feature that is defined as a set of distances originated from the stroke’s centroid to all the centroids on other strokes of the same character. An example demonstrating both features of a stroke is shown in Fig. 13. The proposed features are both translation and rotation invariant. However, these features are not scale invariant. Since our cadastral maps are produced by professional draftsmen who were trained to imitate some specific character size, the characters on map are almost of the same size. Thus, no scaling problem occurs during our character recognition task. If this problem indeed appears on some poor quality maps, a normalization process by conventional methods will solve the problem [20]. In the stroke extraction stage, each stroke of the character has been labeled with a specific number. To facilitate matching, in this stage, each stroke of the unknown target character is assigned a row index and each stroke of the model character is assigned a column index. Each index number is equal to the labeled number of corresponding stroke. An example illustrating this arrangement is shown in Fig. 14.
212
Fig. 13.
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 29, NO. 2, MAY 1999
(a)
(b)
(c)
(d)
Feature set of stroke 2 in a Chinese character.
Fig. 15. (a) Unknown target character; (b)–(d) are model characters.
as follows: if otherwise
Fig. 14.
(3)
is a threshold value, and and are features where pertaining to row and column nodes, respectively. For the symmetric terms, the associated weights should be set equal ). The weight of the relational feature ( ) is (i.e., more important than the other two and is set with higher value.
Row-column assignment for Chinese character matching.
D. Determination of To perform character recognition, the as follows:
in (1) is defined
(2) is the length of the th stroke in the unknown target where is the length of the th stroke in the model character, is the distance between the centroids of the character, th and the th strokes in the unknown target character, and is the distance between the centroids of the th and the th strokes in the model character. The function is defined
E. Similarity Measure After the states of the Hopfield net are stabilized, we can count the number of active neurons in the network and use it to measure the degree of matching (or similarity) between the model character and unknown target character. Since each stroke of a character is of different length, an important is defined based on each stroke’s length. This measure , where important measure of stroke is is the length of stroke and is the total number of strokes in a character. The procedure for similarity measure consists of five steps:
CHEN et al.: AUTOMATIC DATA CAPTURE
213
(a)
(b) Fig. 16.
(a) Model character; (b) set of target characters written by 20 different individuals.
1) Initialize row match, column match, row zero, and column zero to be zero. 2) Count the number of 1’s in each row. If there is no 1 in row , increment row zero by 1, skip to the next row, and leave row match unchanged. If there is only one 1 ) in row , add to row match. If there are 1’s ( to row match. Repeat this for in row , then add all the rows. 3) Do the same calculation for all the columns and update column match and column zero. 4) Update row match and column match by the following equations: row match row match number of rows row zero number of rows column match column match number of columns column zero number of columns 5) Pick up the smaller one from row match and column match, and take the result as the similarity measure.
F. Discussion Given an unknown target character and a large number of model characters, the degree of matching can be derived by comparing the unknown target character and each of the model characters in the final state of the Hopfield net. Ideally, there should be at most one active neuron in each row and column. However, due to the nature of the constraint satisfaction process, it is possible to have more than one candidate in the same row and column in the final state of the network. This situation should be considered as unfavorable and hence decrease the degree of matching. This is the is added to the degree of matching (i.e., reason why when there are row match or column match) instead of 1’s simultaneously existing in the same row or column. When there is no 1 in a row (or column), it means a stroke of the unknown target character (or model character) does not have a correspondence in the model character (or unknown target character). Under these circumstances, we define a penalty function to adjust the similarity measure. The definition of the penalty function is described in step 4 of the similarity measure process. It is obvious that when the number of empty rows (or columns) is larger, the penalty is heavier. In the stage of stroke extraction, the input character may be undersegmented or oversegmented. Therefore, the number of
214
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 29, NO. 2, MAY 1999
RESULTS
OF
TABLE VI AUTOMATIC MAP INTERPRETATION
extracted strokes may not be the expected value. However, the proposed Hopfield net for character matching has a flexible structure to solve the problem that the number of extracted strokes in the unknown target character and the model character are different. In other words, the 2-D neuron array may have different number of rows and columns and it allows inexact matching [21] to be performed in this net. Even though the number of extracted strokes in both characters are different, a model character most similar to the target character can still be selected with the proposed approach. In order to demonstrate the powerfulness of the proposed approach, we first select a set of similar Chinese characters as model characters [see Fig. 15(b)–(d)]. Although these characters are structurally similar, they are quite different semantically. The input unknown target character is shown in Fig. 15(a). The neuron matrix below each model reflects the stroke correspondences between the input character and model character. Since Fig. 15(d) has the highest matching rate (similarity measure) with the input character, it is identified as the best matched model character. It is noted that the input target character is slanted. Thus, this example shows that our approach is rotation-invariant. Fig. 16(a) shows another model character. Fig. 16(b) shows the same character written by 20 different individuals. The word below each handwritten reflects its matching status with the model character shown in Fig. 16(a). Although each individual has his own handwriting style, our system still can recognize over 90% of the set of testing characters. This example shows our approach is robust to the variation of writing style. VI. PERFORMANCE EVALUATION The proposed scheme has been implemented on a Sun Sparc-10 workstation in C language. Experiments were conducted on 20 Chinese cadastral maps. These maps were 3072 pixels). Table VI reports scanned at 500 dpi (4096 the experimental results on 20 maps. These maps contain 1268 characters and 337 parcels totally. Only parcels which are completely in the field of view are evaluated. Some characters and parcels are misextracted because of the poor quality of maps. Since character recognition is achieved by matching the unknown target character with each of the model characters, the number of model characters is critical to system performance. Currently, there are 52 different characters (including ten numeral characters) in our cadastral maps. Table VII shows the average time needed to automatically process a Chinese cadastral map. As shown in Table VII, character recognition is the bottleneck of our system. This is mainly caused by the constraint satisfaction process of matching. In general, the convergence time of neural network depends on the complexity of the target character, model characters, and the features used to perform the matching task. If a small number of neurons
TABLE VII AVERAGE PROCESSING TIME PER MAP SHEET
Fig. 17. Parallel implementation of character recognition.
are not convergent after a long period of time, the time-out strategy is used to terminate the algorithm. VII. CONCLUSION We have presented a system for automatic interpretation of Chinese cadastral maps. The proposed system consists of three components: text/graphics separation, parcel extraction, and rotated character recognition. Our approach to text/graphics separation does not use a priori knowledge about the maps; it is only based on a simple yet effective rule: the feature points of characters are more compact than those of graphics. Our algorithm is independent on the orientation and writing style of character. Besides, our separation algorithm can also handle the serious case that characters touch/cross graphical lines. In the parcel extraction process, the proposed algorithm traces the branches between feature points to extract polygon structure from line drawings. Our character recognition method is based on the matching of extracted strokes using a neural network. The proposed Hopfield net can deal with the matching problem in an elegant manner and is able to quantitatively reflect the degree of similarity in the final states of neurons. Our character recognition technique is rotation-invariant and can tolerate the variation of writing style. Experimental results on real map data have shown that the proposed system is effective in the automatic data capture of geographic information systems. The current implementation of character recognition is done by matching the unknown target character with each of the model characters sequentially (one by one). The system performance depends on the number of characters in the model database. In our future work, we will parallelize the matching task by employing the structure as shown in Fig. 17. By expanding the columns of the Hopfield net, a large number of the model characters in the database can be fed into the network and processing simultaneously. The degrees of matching between the unknown target character and all the model characters in the database can be obtained concurrently in the expanded Hopfield net. Thus, the processing time of character recognition will be reduced greatly. Although our current work concentrates on cadastral map, most of the developed techniques (such as text/graphics separation and character recognition) are quite general and could be applied to other kinds of line drawings such as engineering
CHEN et al.: AUTOMATIC DATA CAPTURE
drawings and form documents. Finally, our future work should also be directed toward the interpretation of more complex maps such as topographic maps and utility maps. For this issue, some modifications on the method of graphical symbol recognition are necessary. ACKNOWLEDGMENT The authors wish to thank the reviewers of this paper for their many valuable suggestions on the preparation of this manuscript.
215
Liang-Hua Chen received the B.S. degree in information engineering from National Taiwan University, Taipei, Taiwan, R.O.C., in 1983, the M.S. degree in computer science from Columbia University, New York, NY, in 1988, and the Ph.D. degree in computer science from Northwestern University, Evanston, IL, in 1992. After completing two years of military service, he joined the Institute of Information Science, Academia Sinica, Taipei, as a Research Assistant. From March to July 1992, he was a Senior Engineer at Special System Division, Institute for Information Industry, Taipei. He is currently an Associate Professor in the Department of Computer Science and Information Engineering, Fu Jen University, Taipei. His research interests include computer vision and pattern recognition.
REFERENCES [1] S. Suzuki, M. Kosugi, and T. Hoshino, “Automatic line drawing recognition of large-scale maps,” Opt. Eng., vol. 26, pp. 642–649, July 1987. [2] R. Kasturi and J. Alemany, “Information extraction from images of paper-based maps for query processing,” IEEE Trans. Softw. Eng., vol. 14, pp. 671–675, May 1988. [3] M. T. Musavi, M. V. Shirvaikar, E. Ramanathan, and A. R. Nekovei, “A vision based method to automate map processing,” Pattern Recognit., vol. 21, pp. 319–326, 1988. [4] S. Suzuki and T. Yamada, “MARIS: Map recognition input system,” Pattern Recognit., vol. 23, pp. 919–933, 1990. [5] L. Boatto, V. Consorti, M. D. Buono, and S. D. Zenzo, “An interpretation system for land register maps,” Computer, vol. 25, pp. 25–33, July 1992. [6] S. Shimotsuji, O. Hori, and M. Asano, “A robust recognition system for a drawing superimposed on a map,” Computer, vol. 25, pp. 56–59, July 1992. [7] H. Yamada, K. Yamamoto, and K. Hosokawa, “Directional mathematical morphology and reformalized Hough transformation for the analysis of topographic maps,” IEEE Trans. Pattern Anal. Machine Intell., vol. 15, pp. 380–387, Apr. 1993. [8] N. Ebi, B. Lauterbach, and W. Anheier, “An image analysis system for automatic data acquisition from colored scanned maps,” Machine Vis. Applicat., vol. 7, pp. 148–164, 1994. [9] T. K. ten Kate, J. E. den Hartog, and J. J. Gerbrands, “Knowledgebased interpretation of utility maps,” Comput. Vis. Image Understand., vol. 63, pp. 105–117, Jan. 1996. [10] F. M. Wahl, K. Y. Wong, and R. Casey, “Block segmentation and text extraction in mixed text/image documents,” Comput. Graph. Image Process., vol. 20, pp. 375–390, Dec. 1982. [11] L. A. Fletcher and R. Kasturi, “A robust algorithm for text string separation from mixed text/graphics images,” IEEE Trans. Pattern Anal. Machine Intell., vol. 10, pp. 910–918, Nov. 1988. [12] T. Y. Zhang and C. Y. Suen, “A fast thinning algorithm for thinning digital patterns,” Commun. ACM, vol. 27, pp. 236–239, 1984. [13] J. T. Tou and R. C. Gonzalez, Pattern Recognition Principles. Reading, MA: Addison-Wesley, 1974. [14] H. Freeman, “Computer processing of line drawing images,” Comput. Surv., vol. 6, pp. 57–98, Mar. 1974. [15] S. D. Zenzo, M. D. Buono, M. Meucci, and A. Spirito, “Optical recognition of hand-printed characters of any size, position, and orientation,” IBM J. Res. Develop., vol. 36, pp. 487–501, May 1992. [16] L.-H. Chen and J.-Y. Wang, “A system for extracting and recognizing numeral strings on maps,” in Proc. 4th Int. Conf. Document Analysis and Recognition, Ulm, Germany, Aug. 1997, pp. 337–341. [17] C.-W. Liao and J. S. Huang, “Stroke segmentation by Bernstein–B´ezier curve fitting,” Pattern Recognit., vol. 23, pp. 475–484, 1990. [18] J. J. Hopfield, “Neural networks and physical systems with emergent collective computational abilities,” in Proc. Nat. Acad. Sci., vol. 74, pp. 2554–2558, Apr. 1982. [19] W.-C. Lin, F.-Y. Liao, C.-K. Tsao, and T. Lingutla, “A hierarchical multiple-view approach to three-dimensional object recognition,” IEEE Trans. Neural Networks, vol. 2, pp. 84–92, Jan. 1991. [20] G. Srikantan, D.-S. Lee, and J. T. Favata, “Comparison of normalization methods for character recognition,” in Proc. 3rd Int. Conf. Document Analysis and Recognition, Montreal, P.Q., Canada, Aug. 1995, pp. 719–722. [21] L. G. Shapiro and R. M. Haralick, “Structural description and inexact matching,” IEEE Trans. Pattern Anal. Machine Intell., vol. PAMI-3, pp. 504–515, Sept. 1981.
Hong-Yuan Liao received the B.S. degree in physics in 1981 from National Tsing-Hua University, Hsinchu, Taiwan, R.O.C., and the M.S. and Ph.D. degrees in electrical engineering from Northwestern University, Evanston, IL, in 1985 and 1990, respectively. From June 1990 to June 1991, he was a Research Associate in the Computer Vision and Image Processing Laboratory, Northwestern University. In July 1991, he joined the Institute of Information Science, Academia Sinica, Taipei, Taiwan, as an Assistant Research Fellow. Since March 1995, he has been an Associate Research Fellow of the same institute. His research interests are in computer vision, neural networks, and wavelet-based image analysis. Dr. Liao was the recipient of Academia Sinica’s Young Investigators’ Award in 1998. He served as the program chair of the first International Symposium on Multimedia Information Processing in 1997.
Jiing-Yuh Wang received the B.S. degree in information science from Tung-Hai University, Taiwan, R.O.C., in 1990 and the M.S. degree in information engineering from National Central University, Taiwan, in 1994. He is currently with Alcatel Telecommunication Company, Taipei, Taiwan. His research interests include pattern recognition and geographical information systems.
Kuo-Chin Fan (M’88) was born in Hsinchu, Taiwan, R.O.C., on June 21, 1959. He received the B.S. degree in electrical engineering from National Tsing-Hua University, Hsinchu, in 1981, and the M.S. and Ph.D. degrees in 1985 and 1989, respectively, both from the University of Florida, Gainesville. In 1983, he was with the Electronic Research and Service Organization, Taiwan, as a Computer Engineer. From 1984 to 1989, he was a Research Assistant at the Center for Information Research, University of Florida. In 1989, he joined the Institute of Computer Science and Information Engineering, National Central University, Taiwan, where he became a Professor in 1994. He was the Chairman of the department from 1994 to 1997. His current research interests include image analysis, optical character recognition, and document analysis. Dr. Fan is a member of SPIE.