similarly, the black text layer contains not just the street names but also elevation values ... In the next stage, vectors are grouped into non-overlapping polygons.
Reading Street Names from Maps { Technical Challenges G. Nagy , A. Samalz, S. Sethz, T. Fisherz, E. Guthmannz, K. Kalafala , L. Liz, S. Sivasubramaniam ,and Y. Xu y
y
y
y
y. Rensselaer Polytechnic Institute, Troy, New York 12180, USA z. University of Nebraska-Lincoln, Lincoln, Nebraska 68588, USA
Abstract
An unsolved problem in automated map conversion is the recognition of street lines and their associated names. This paper describes our eorts to overcome the challenges by this endeavor. The goal of this research is to develop the techniques for conversion, evaluate their eectiveness and estimate the cost of conversion. Since any completely automated approach is unlikely to be perfect, interactive error correction is integrated into automated conversion and all operator corrections are logged. This log will be used in adaptation of system parameters for automated processing and in developing a model for the cost of conversion. Our emphasis is not on the conversion of a single map, but on a batch of maps of the same type.
1 Introduction A surprisingly dicult problem in converting urban and semi-urban maps to computer readable format is identifying street lines and associating them with street names. Providing this information in digital form simpli es updates, distribution and archival. In addition, integrating such information with a Geographic Information System (GIS) opens up applications like urban planning, routing, and automated vehicle guidance. Figure 1 illustrates the anticipated output of the street extraction process. We emphasize that the goal of our research1 is to address the challenges, not build a complete system for map conversion. One problem that permeates all facets of processing is the size of the image. A typical USGS 7 5 quad map digitized at 1000 dpi (dots/inch) and 8bpp (bits/pixel) requires 425 Megabytes of space (uncompressed). While this is well within the boundaries of current technology for storage, processing information of this volume does take time and must be handled eciently. In addition to developing techniques for conversion, one of our major objectives is to develop an approach that will improve with repetitive conversion of similar maps, resulting in a gradual reduction in cost. Therefore, adaptation is a key element and must be incorporated at all levels of processing. We intend to develop an accurate cost model for the conversion :
1
0
The work is funded by the National Imagery and Mapping Agency (NIMA).
Figure 1: A section of USGS map and the desired result of conversion. process. We expect that the principal cost of conversion will be in the operator involvement in xing errors. To build a realistic cost model one must provide accurate instrumentation to log the operator interaction. The software package used for our research is ARC/INFO by ESRI[1]. It has many useful tools that have helped in our research. Even though it is extensive, it has several limitations which are discussed later in this paper. There is much published literature on geographic information systems that deals speci cally with map-image processing (see [2] for an early survey and [3] for a more recent one). Recent work on model-based map interpretation is reported in [4, 5]. An approach using contextual information to recognize features of maps and associating them with labels is presented in [6]. It should be pointed out, however, that there is no signi cant body of research that deals with adaptation and cost models, two of the central issues in this research.
2 Issues in reading street names In the USGS map, street lines and names are all printed in black. Thus separating the black layer with a digital color lter provides a starting point for processing the image. However, major challenges remain even with this simple and eective step of information reduction. The text and line layers frequently overlap; text strings appear in many orientations; the line layer contains not only streets but also grid lines and political boundaries; similarly, the black text layer contains not just the street names but also elevation values and names of neighborhoods, political districts, schools, hospitals, other built-up areas, physiographic and topographic features. Our approach to automated conversion involves four related steps: (a)
identi cation of the sub-layers containing street lines and names, (b) vectorization of the line sub-layer, (c) recognition of names, and (d) association of street names with streets. Each step involves its own challenges as described below. Further details can be found in reference[7].
2.1 Identi cation of street and text layers
Topographic maps are produced by overlay techniques which result in frequent text, icon, and street juxtaposition. Furthermore, such overlap may occur within the same color plane. The goal here is to identify the street lines and then isolate street names from the remaining objects. The rst step is to perform color separation, which produces a black layer image. The black layer contains street lines, grid lines, icons and text. There are several problems which complicate segmentation of these objects: overlap among the entities, broken lines, and small trac islands which are dicult to distinguish from characters. An approach to separating nonoverlapping text from graphics is described in [8]. Extracting lines from images with line-drawings and overlapping text is presented in [9]. A connected component analysis is performed on the black layer. Each component is classi ed as text, icon, or line, based on the attributes of its bounding box. Currently, we use black density and maximum length (either bounding box width or height, whichever is greater). Figure 2 shows the results of sub-layer separation. The initial classi cation is improved by feedback from line processing by allowing the removal of overlapping characters from the street lines. Vectors which are considered suspicious by line processing provide clues as to where non-street objects lie in the street layer. Additionally, missing features (such as small trac islands) may be identi ed by such means. An attempt is made to group non-street objects in to word blocks. Those which remain isolated are considered icons. Street names are distinguishable from other labels by proximity to identi ed street lines, font, and case.
2.2 Street line processing
The street layer image is vectorized using ArcScan[1]. Some overlapping noise can be skipped over by requesting ArcScan to trace only those lines whose characteristics (e.g. thickness, line-width variation) fall within predetermined limits. The resulting vectors are examined in order to identify street lines. The decision is based on three measurements: line thickness, line length, and matings to other line segments within constraints of proximity, similarity of orientation, and apposition. In the next stage, vectors are grouped into non-overlapping polygons. In situations where this is not possible, a localized re-vectorization may be
Figure 2: Clockwise from top left: black layer, street layer, icon layer, and text layer. applied in order to disambiguate the topology. A graph is constructed where polygons (city blocks) are considered as nodes. Two nodes are connected by an edge if and only if the corresponding polygons share some street. The faces of this graph correspond to intersections. After the intersections are identi ed and analyzed, a street-arc graph is produced which provides intersection coordinates and center-line geometry for each street.
2.3 Identi cation of street names
Street names on maps are often overlapped by street and grid lines. These lines are noise from the point of view of optical character recognition (OCR) and cause many errors for commercial OCR packages. In addition to putting eort into separating the text and line art layers in order to derive clean street name images, we developed a document-speci c OCR
Table 1: Summary of OCR errors. No. of characters (training set) No. of characters (test set) No. of classes (with template) No. of test characters with template Errors on test characters with template Commercial OCR error on same characters
197 154 27 148 8 (5%) 34 (23%)
system that is less vulnerable to character segmentation errors. This system is based on the observation that all street names on the map are in a single font and in one of three type sizes. In the training stage, character templates are obtained from a few operator-labeled street names by the word shifting algorithm described in [10]. These templates are used to recognize the rest of the street names on the same map with a level-building algorithm [11]. There are two advantages to this approach: 1. Both template extraction and street name recognition procedures can tolerate noisy input images. This allows the OCR to be less dependent on sub-layer separation. Furthermore, the recognition result provides useful feedback for sub-layer separation, so that improved segmentations may be produced. 2. After the operator corrects some of the initial street name recognition results, new character templates can be added to improve further recognition. The results of a street name identi cation experiment are shown in the Table 1. Character templates were extracted from two 3 3 sections (not shown here). Recognition is performed on the test chip (Figure 1). 00
00
2.4 Associating streets with street names Street names are segmented into individual name labels. A typical street name, for example RAINIER AVE, has a speci c label (RAINIER) and a generic label (AVE ). Each label is associated to the street line segment with the best relative orientation, perpendicular displacement, and apposition to the label baseline. A stack-based tracing algorithm links together segments with associated speci c and generic labels (not every line segment has an associated label). Across intersections, the routine considers the possible continuations, and chooses the best option based on constraints of collinearity, node degree, and street width[7].
At ambiguous intersections, the most likely path is determined using the criteria mentioned above. Tracing is discontinued if a con ict is encountered, (e.g. consecutive generic labels). In this case, the ambiguous intersection is revisited and the next most likely option is pursued. Repeated speci c labels con rm the tracing path.
3 Error handling
Types of errors: The types of errors that may occur during automated
processing can be broken down according to the phase in which they occur. This list is by no means exhaustive, but it contains some of the more common errors encountered.
Sub-layer separation: Color separation may be imperfect. Charac-
ters, streets, or icons may not be correctly identi ed. Line processing: Vectors which are not street lines (i.e. characters or grid lines) are included. Street lines may be missing. Intersections may be inaccurately analyzed. Optical character recognition: Characters may be recognized inaccurately. Association: At \Y" intersections, it is possible that the wrong fork is chosen as the continuation of the current street. Also, if a name is repeated along a street line (i.e. Washington St. is printed more than once along a street line), and one of the names is not correctly read by the OCR phase, separate streets may be created.
Interactive error correction: Our work diers from others of its kind in that it includes operator interaction as an integral part of the conversion process. This imposes additional requirements on the operator interface. Simple commands to allow easy training and rapid correction of er-
rors. We do not expect the operator to be familiar with the details of automatic conversion, but to use only common map knowledge to enter training data and correct errors. Adequate provisions for manual conversion of an entire map segment. This is important for predicting the reduction in operator time between manual and automated map conversion. A log of all operator actions to provide feedback. The interface must provide enough information about operator corrections to enable us to determine the source of conversion errors and ne tune operational parameters.
A log of operator time. This is required to construct and verify our
cost model. Many of the available GIS packages provide the resources necessary to realize these design goals, but none provides a user interface that is particularly well suited for our purposes. ARC/INFO as packaged has several disadvantages. The ARC/INFO command line interface is dicult to learn and requires a sequence of many commands to implement conceptually high-level operator actions. ArcTools, the interface that comes with ARC/INFO, is still too generalized for our purposes. Facilities for logging operator actions are primitive and do not provide enough information to accurately assess errors and adjust parameters. For these reasons, we have opted to use the Arc Macro Language (AML) to build a customized operator interface while still retaining the power of ARC/INFO's underlying software. The interface provides simpli ed correction and logs all aspects of the operator actions.
Operator log: The log records the speci c corrective actions the operator has performed. It provides the system with the necessary information for adaptation (discussed in next section). In addition, a detailed breakdown of the time spent by both the operator and the system on the various correction tasks is recorded. This information is used in the construction of the cost model.
4 Adaptation Devising a system for processing documents (in this instance, topographic quadrangles) of a single type is easier than building a general-purpose system, and document-speci c systems can be more eective than generalpurpose systems. The disadvantage of task-speci c solutions is that a new system must be built, by expert craftsmen, for every new type of document. Furthermore, even a special-purpose system requires many parameters color values, line-widths, typefaces - that can be accurately estimated only by processing large volumes of digitized data. Adaptive or learning systems are intended to obtain the best of both worlds by gradually customizing a general-purpose system to a particular document or family of documents. While the idea is far from new, few experimental results have been reported on adaptive cartographic systems because it is necessary, before any learning can be demonstrated, to develop a static system without learning. In our case as well, most eort to date has been devoted to developing the basic building blocks, and it is only now that we are beginning to experiment with adaptation. Adaptation (or training, customization, learning) is generally considered either supervised or unsupervised. In supervised adaptation, feedback is provided by a skilled operator. The feedback in unsupervised adaptation is
provided by downstream components of the system itself, and is therefore more likely to be error prone. We make use of both types of adaptation.
Supervised adaptation: We intend to carry out an entire conversion task as a sequence of sub-tasks. Each sub-task calls for the complete conversion of part of the data. For a single map (our present task), the rst sub-task is a 3 3 section of the map: the entire 18 24 map might consist of 48 sub-tasks. The sub-tasks need not all be of the same size: in fact, we expect the size of consecutive sub-tasks to grow roughly exponentially (e.g. 3 3 , 6 6 , 12 12 , ). For multi-map conversion, each new map may be a sub-task. Each sub-task consists of two major cycles. The rst cycle is automated, while the second cycle requires interactive operator correction of all errors made by the system. The sequence of automated and interactive steps is preceded by an interactive initialization step. As an example of supervised adaptation, consider the interpretation of street labels. The error rate obtained by our OCR algorithm depends heavily, as do most other OCR algorithms, on the amount of training data. Only a small number of character prototypes (fewer than 200) are obtained from the initialization step, therefore a relatively large number of street names may be misrecognized on the rst 3 3 chip. Certainly all the rare characters, for which no templates are available because they did not occur in the initial data, will be misrecognized. Once the operator corrects these, hundreds of character prototypes can be used to construct better recognition templates. Therefore the recognition performance on the next chip will be improved. The character recognition portion of the system also makes use of the context provided by common generic street names, such as ST, AVE, SQ. Again, some of these may not be included during the initialization step, but will be discovered during the label-street association phase and added to the lexicon. We shall extend supervised learning to improving the layer-separation, vectorization and association phases as well. This requires, however, more complex analysis of operator interaction. We plan to use an expert system shell to decide which parameter to adapt as a result of a set of operator corrections. 00
00
00
00
00
00
00
00
00
00
:::
00
00
Unsupervised adaptation: Unsupervised adaptation or internal feed-
back takes place entirely during the automated cycles. The consistency of the data produced by a particular processing step can be evaluated by some downstream processing step. If the output of the rst step is found (in a statistical sense) inconsistent, then this information is used to change the parameters of a previous step. As an example, it is possible to evaluate the vectorization of the street layer by accumulating statistics on the length of the street line segments, the frequency of paired and unpaired segments, open polygons, average line
thickness, and the degree of line-segment intersections. If these statistics do not correspond to those expected for sound vectorization, then the street layer can be extracted again with changed parameters. The new layer is then vectorized and evaluated. The underlying notion is that higher-level characteristics (e.g. \urban blocks are surrounded by a solid line" or \the street network is connected") are more reliable than pixel con gurations. Similarly, a large number of unrecognizable characters can be used to detect errors in the extraction of the street-name layer. Feedback of the location and shape of the best- tting templates, and of the coordinates of the vectorized line segments, can be used to improve discrimination of the street and label layers wherever characters overlap line art. Further downstream, repeated failures to nd consistent street-label association may signal a need to change the parameters of the vectorization itself (ArcScan oers more than a dozen settable parameters). We are not yet far enough along to provide convincing examples of successful feedback from such consistency checks, but do intend to pursue their implementation energetically.
5 Cost model The processing of a 1000 dpi image of a topographic quadrangle does require substantial processor, memory, disk, and software resources. Even so, in constant dollars, the computing costs continue to fall dramatically and are likely to be dominated by the human operator costs. Therefore, we ignore the computing costs in our cost models and further assume that the operator costs are directly proportional to the total operator time.
Manual conversion: Our graphic interface for interactive error correction is suitable for both manual conversion and as an adjunct to automated processing. With the enlarged black-layer image in the background as the guide, the operator can manually complete the task by drawing the street segments, adding street names, and associating segments with names. For the 3 3 Brentwood chip shown in Figure 1, our log shows that it took 31.4 minutes to manually convert the street layer. If the rest of the map was of the same complexity, it would take approximately 24 hours for conversion of the street layer for the whole map. This estimate is a useful measure of the cost of conversion using the currently available interactive tools. As such, it is also a good reference point for evaluation of our approach. A more re ned cost model for manual conversion can be derived from an analysis of the operator log for the conversion of the Brentwood chip. The data appear in Table 2. The last column shows the total time normalized for each type of operation. The \Change Display" operation refers to the panning and zooming required for conversion. This could be necessary during any of the rst three operation types, therefore, in the last column we have used the total number of operations (of all types) for normalization. 00
00
Table 2: Operator log summary. Operation
Type
Number
Time (secs)
User
CPU
Total
Time (secs) per Operation
Draw Segments 149 1137 11 1148 7.7 Add Names 27 296 0 296 11.0 Associate Names 46 197 9 206 4.5 Change Display { 132 63 195 0.88* Correct Mistakes { 10 0 10 0.05* Total 222 1772 83 1855 8.36* * Time normalized over the total number of operations of all types. Similarly, mistakes could be made during any type of operation, justifying the normalization used. The normalized values in the last column may be used as the parameters of a linear cost model for manual conversion. The variables of this model are the numbers of segments, names, and associations. These may be estimated through sampling or other means.
Semi-automated conversion: Here, the operator is required initially to provide training data for various processing modules and, subsequently, to correct errors after each cycle of automated processing. Street features missed in automated conversion can be added by operations used for manual conversion but other errors require additional operations provided in the user interface: to delete or correct names, delete segments, disassociate segments with the current street name, and to add, move, or delete points in a segment. Since both the errors of omission and commission are highly dependent on the image and algorithmic parameters, the development of an accurate cost model depends on our ability to express the types and numbers of user operations as a function of these parameters. In the case of OCR, the image quality is found to be the determinant factor in the performance of leading devices. Therefore, attempts have been made to predict OCR accuracy using simple image features, such as the amount of white speckle and broken characters [12]. The state of the art in map conversion, on the other hand, is far less advanced and even the conversion from high-quality, high-resolution map image is challenging. Therefore, we will attempt to relate the performance to only algorithmic parameters.
6 Evaluation To evaluate our approach we seek to measure to what extent we have been successful in automating the conversion process. The most signi cant indi-
cators are cost and accuracy. Our main cost concern is operator time. By summarizing the log les, we can determine not only the amount of time spent on various corrections, but also the magnitude and type of errors that are being corrected. This will help to demonstrate which parts of the automated process still need the most work, and guide future research. To assess accuracy, we verify our results against an independently converted version of the map. For this purpose, we have investigated both the Census Bureau TIGER (Topologically Integrated Geographic Encoding and Referencing) les and the USGS DLG (Digital Line Graph) databases. The TIGER les lack the positional accuracy that we need, but the DLG les are derived from the same source map and have positional accuracy that meets established standards. This provides an established \ground truth" against which we measure our positional accuracy and completeness of street name association. Furthermore, this will allow us to verify the extent to which the operator has removed any residual errors.
User log les: From the operator corrections, we obtain a number of
measures of the quality of the automated conversion. In particular, we can characterize the quality of vectorization, text recognition and street name association. Measurements calculate the percentage of street segments which suer from the following errors: missing, extra or misplaced street segments; missing or incorrect names; and incorrect association.
Comparison to DLG : Using ARC/INFO, we can perform several comparisons to the DLG \ground truth". This allows us to characterize positional accuracy before and after operator correction. Measures include: Total network length: We expect to nd the same street network length as that present in the DLG les. Closeness: Accuracy standards in DLG require that 90% of points should be within 0.02 inches of the original map position. We compute the percentage of the street network within this tolerance of DLG street lines, showing how well excess lines have been excluded. Conversely, computing the percentage of the DLG street lines that are within the tolerance of recognized street lines shows how complete our coverage is. Intersections: We can produce an estimate of positional accuracy of intersections by computing the average distance from each of our recognized intersections to the nearest DLG intersection.
7 Summary Our studies are intended to accelerate the introduction of automated mapconversion systems into an operational environment. The contributions to
topographic map conversion that we are developing are: sub-layer separation with feedback from the vectorization and OCR stages; line-pair vectorization with map-speci c constraints; multi-thread intersection analysis based on standard graph-tracing algorithms; map-speci c OCR with a high level of noise immunity; generalized arc-label association based on generic and speci c labels; exploitation of operator corrections for supervised adaptation; feedback from downstream processing routines to improve upstream operations; methodology for detailed accuracy assessment against reliable ground truth; predictive cost model based on automated \time-motion" study of the operator.
References [1] Environmental Systems Research Institute, Inc., ARC/INFO Software, 1997. [2] G. Nagy and S. Wagle. Geographic data processing, ACM Computing Surveys, 11:2:139-181, 1979. [3] R. Kasturi, R. Fernandez, M. L. Amlani, and W-C Feng. Map Data Processing in Geographic Information Systems, IEEE Computer, 22:12:10-21, 1989. [4] R.D.T. Janssen, The application of model-based image processing to the interpretation of maps, Doctoral Dissertation, Technical University of Delft, 1995. [5] J. Den Hartog, A framework for knowledge-based map interpretation, Doctoral Dissertation, Technical University of Delft, 1995. [6] G. K. Myers, et al. Veri cation Based Approach for Automated Text and Feature Extraction from Raster-Scanner Maps, Graphics Recognition: Methods and Applications, R. Kasturi and K. Tombre (Eds.), Lecture Notes in Computer Science 1072, Springer, pp. 190-203, 1996
[7] G. Nagy, et. al., A Prototype for Adaptive Association of Street Names with Streets on Maps, To appear in Int. Workshop on Graphics Recognition'97, August 1997. [8] L. A. Fletcher and R. Kasturi, A Robust Algorithm for Text String Separation from Mixed Text/Graphics Images, IEEE Transactions on PAMI, 10:6:910{918, 1988. [9] T. Kaneko, Line Structure Extraction from Line-Drawing Images, Pattern Recognition, 25:9:963{973, 1992. [10] G. Nagy and Y. Xu, Automatic Prototype Extraction for Adaptive OCR, Procs. ICDAR-97, Ulm, Germany, 1997. [11] L. R. Rabiner and B.-W. Juang, Fundamentals of Speech Recognition, Prentice Hall, 1993. [12] L. R. Blando, J. Kanai, and T. A. Nartker. Prediction of OCR accuracy using simple language features, Proc. Int. Conf. on Document Analysis and Recognition, Vol. I, pp. 319-322, 1995.
Acknowledgments This work is supported by a grant from the National Imagery and Mapping Agency as a part of the Intelligent Map Understanding Project. Support is also provided by the University of Nebraska-Lincoln, Center for Communication and Information Science. Part of this work was carried out in New York State Center for Advanced Technology (CAT) in Automation, Robotics and Manufacturing at Rensselaer Polytechnic Institute. The CAT is partially funded by a block grant from the New York State Science and Technology Foundation.