Automatic Annotation of Geographic Maps Mirko Horstmann1 , Wilko Heuten2 , Andrea Miene1 , and Susanne Boll3 1
Center for Computing Technologies (TZI), Universit¨at Bremen, Postfach 330440, 28334 Bremen, Germany {mir, miene}@tzi.de http://www.tzi.de/bv 2 OFFIS Escherweg 2 26121 Oldenburg, Germany
[email protected] http://www.offis.de 3 University of Oldenburg Department of Computing Science Escherweg 2 26121 Oldenburg, Germany
[email protected]
Abstract. In this paper, we describe an approach to generate semantic descriptions of entities in city maps so that they can be presented through accessible interfaces. The solution we present processes bitmap images containing city map excerpts. Regions of interest in these images are extracted automatically based on colour information and subsequently their geometric properties are determined. The result of this process is a structured description of these regions based on the Geography Markup Language (GML), an XML based format for the description of GIS data. This description can later serve as an input to innovative presentations of spatial structures using haptic and auditory interfaces.
1
Introduction
Many of our daily tasks require knowledge about the layout and organisational structure of physical environments. These tasks include navigation and orientation as well as communication about geographic locations and are most often supported by maps. A map can be seen as a two-dimensional representation of a real world environment with a reduced amount of information that is created for a specific context and goal. Information stored in maps can be used to build a mental model of the physical space and to understand geographical entities and relationships. Maps use graphical representations to visualise an areas’ spatial layout and the semantic entities it contains such as parks, gardens, buildings and streets. Most of the existing map material is stored and managed in Geographic Information Systems (GIS) by the publisher and typically includes the modelling of the geographic area and the different elements and layers that belong to it. However, in many cases an end user only
2
Mirko Horstmann et al.
receives a bitmap of the map in which all semantic entities only exist implicitly, encoded as coloured pixels. This almost complete loss of semantic information excludes many people from using the geographic maps as an orientation and exploration support: Visually impaired and blind people cannot see the layout of the map, illiterate people do not have access to the text included in maps, people with motor deficiencies cannot point to tiny elements on the map as there is no option to zoom into them in a semantic fisheye fashion. Unfortunately, visually impaired people rely even more on the information stored in maps than sighted people because building a detailed mental model is required as a preparation for tasks like navigation and orientation in unfamiliar environments. A number of techniques can be employed to make maps accessible. The most widely used method are tactile printouts of the maps, which can be produced on swell paper or, in a more complex process, with thermoforming. However, tactile diagrams suffer from their relatively low resolution and the limited ability of the finger tips to recognise fine structure. Patterns that symbolise different areas on a map are therefore limited to a few distinguishable textures. Furthermore, lines of Braille text are usually 6mm high and cannot be reduced in size and thereby further clutter the tactile image. The result of these problems is that existing maps usually have to be completely redesigned in order to present them in a tactile format. There is no fully automatic process for a conversion as one might think. Nevertheless, tactile maps are an important aid for blind people to make themselves familiar with new environments, as is shown in [6]. Projects like TACIS[1] or, more recently, BATS[2] have tried to overcome the limitations of tactile maps with combined tactile and auditory output. Like most approaches, which use specialised, non-visual presentations to convey information, they rely on the conversion of existing material, which often means that a laborious manual process must be applied if this material is not already in a structured format that includes semantic descriptions. The TeDUB[3] system therefore aimed to semi-automatically interpret simple diagrams for an accessible presentation using image and knowledge processing techniques. Our approach to the problem is a software system that extracts semantic entities from maps provided as bitmap images. The software first identifies coherent regions of similar colour and classifies them as one of several known types. Next, nearby regions are grouped to form single entities. For each of these, a structured representation of its shape and its type is generated in the standardised Geography Markup Language (GML). This description scheme then forms the semantic annotation of the map, which can be used in various ways, e.g., for non-visual representations like haptic, tactile or auditory display as well as any multimodal combination. With this information the user can build a mental model and familiarise with a physical environment. The proposed use-case is that of exploration of a given area for an overview, rather than for exact navigation. This first implementation is therefore meant as an alternative to tactile orientation maps. Our proposed solution focuses on city maps. However, most ideas can be applied to other map types. Maps may later enter the system through a scanner interface for printed material or as bitmap images from web pages.
ICCHP 2006
2
3
Requirements for the software
In this section, we identify the requirements for a system that extracts semantic information of existing city maps automatically and provides this information in a format that can be used by other systems for a non-visual representation for blind and visually impaired or otherwise print impaired people. The analysis addresses three topics: The kind of information that has to be conveyed to the user, the requirements for existing maps from which to extract the information, and the interchange format for the information. 2.1
Entities of City Maps
There are no known open standards as for what objects a city map should consist of, publishers tend to establish their own standards. Therefore we have compared several city maps of various online and print publishers and have found the following set of typical objects: – – – – – – – – – –
Parks and gardens Water (lakes, rivers, seas) Streets of various types Squares and Places Quarters and other organisational structures Monuments, sights, points of general interest (hotels, shops, . . . ) Public buildings (churches, schools, town hall, . . . ) Public transportation information (stations, routes, . . . ) Bridges and tunnels Additional objects to help interpreting the map (keys, scale, north arrow, . . . )
Most of these items are associated with additional attributes like their names or one-way directions for the traffic. Moreover, a city map implies the location of objects relatively to other objects based on a mapping from the real world. The city map mediates shapes of certain objects-types. With the help of a scale a user can measure distances and determine sizes of objects. In order to familiarise with a city and to get an initial overview, large objects, their shape and location as well as distances between objects are more important than small streets and single buildings. Furthermore, too much information at the same time will make it more difficult for the user to build a mental model of the depicted city. Therefore, the presentation should only include larger objects or groups of smaller objects of the same type, which are geographically close to each other. E.g., single houses should be grouped to blocks, several blocks to residential areas. In order to reduce the amount of information, which is presented at the same time, functions that appear in current map viewers are also useful for non-visual representations. These include filtering objects, changing level of details, zooming and panning. These have to be implemented in the specific viewer and are not covered in this paper, although the underlying format used to communicate the data must support these functions (see Section 2.3) .
4
2.2
Mirko Horstmann et al.
Maps for Semantic Extraction
Maps usually come in two formats: bitmap images and vector graphics (GIS data). Although vector graphics are more amenable for the task of extracting the necessary spatial information about areas of interest, it is often the case that bitmap versions are more easily available (e.g., on web pages or as scanned images from printed maps) whereas vector version usually have to be obtained commercially and are then rather restricted regarding their use and redistribution. 2.3
General Requirements for Modelling Semantic Annotations of City Maps
It is important that the format for storing the extracted semantic information is open, easy to read and easy to distribute. Therefore a standardised modelling language for geographic entities is strongly recommended. In order to share semantic information between other maps and publishers, the semantic information should be stored separately from the map itself. The format should be powerful enough to describe the entities listed in Section 2.1 and their attributes. Keeping the later non-visual representation in mind, the description of the geographic objects should not be in a visual format, e.g. pixels, but rather in a vector format, which can be used for haptic and auditory rendering. Furthermore a publisher should be able to extend the description for individual needs.
3
Automatic Extraction of Semantic Information from City Maps
There are various types of maps which code geographic information in different ways. Nevertheless, sighed people are able to identify the most important things at a glance. In this paper we focus on city maps with a typical set of colours to distinguish different entities like watercourses, parks or buildings and our approach makes use of this special kind of colour code for an automatic extraction of entities. After a pre-processing step that removes text and noise from the image, image regions with a specific colour (one that is within a given interval of values for the separate colour channels) are detected. This results in a set of image masks of which each one marks all regions of one respective kind of entity. Nearby regions are then grouped to form areas with the respective entities. Since we do not consider textual information printed on the map in this first prototype, our first processing step is to remove text through morphologic operations. An example is shown in fig. 1 where black text is removed through a morphologic closing operation. This approach is simple but suitable to remove text on most maps. If a map includes text in different colours, our more advanced approaches for text/background separation could later be employed (see e.g. [5, 4]). The next step is to find image regions with a specific colour which represent certain objects. The colour of each requested object is specified by colour intervals given by a minimum and maximum value for each of the red, green and blue colour channels. To reduce noise the image is smoothed with a median filter. To segment the image regions belonging to a known entity the image is binarised using the colour intervals
ICCHP 2006
5
Fig. 1. City map example ,,Britzer Garten” (original, left) and after removing textual information c Falk Verlag, http://www.falk.de (right).
Fig. 2. City map example ,,Britzer Garten” and segmentation results for watercourses. c Falk Verlag, http://www.falk.de
specified for that entity. The resulting image mask shows pixels that are within the given colour intervals and therefore represent the requested objects. Small areas which are irrelevant for further interpretation are removed from the binary image by morphologic operations. Figure 2 shows an example for the segmentation of watercourses. Often, objects are split into several image regions by other objects. A typical example is a park which is split into several green image regions by paths or roads crossing through it. Such regions have to be grouped together and treated as one object during further interpretation. To group these regions together a threshold is specified up to which distance regions of the same colour are clustered together. A cluster of regions is then represented by its convex hull which is later described as a polygon. The clustering step also allows us to group together several buildings to a building complex. Fig. 3 shows how the clustering process is influenced by the distance threshold.
4
GML for Modelling Semantic Information on City Maps
The Geography Markup Language (GML) – an initiative by the Open Geospatial Consortium (OGC) – enables the specification of two- and three-dimensional geographical objects (also referred to as features). It provides only a general framework for describing geographic features (e.g., it provides a “polygon” element rather than “lake”, “for-
6
Mirko Horstmann et al.
Fig. 3. Clustering of watercourses with a distance threshold of 15 (left) and 60 (right). c Falk Verlag, http://www.falk.de
est” or “building” elements). It is the application developer’s task to specify his or her own application schema. By using and extending the GML we can describe and model the extracted semantic information discussed in Section 2.1 and ensure that the format is open and standardised and therefore readable by everyone who wants to convey geographical information to blind and visually impaired users. Section 2.1 lists a set of geographic entities that we need to describe – a “collection of features” in the GML nomenclature. GML provides abstract types for features as well as collections. In order to use them, they must be extended to form concrete types. For our schema, we have therefore defined two element types “GeographicFeatureCollectionType” and “GeographicFeatureType” based on the mentioned abstract types. From these types, we instantiate two elements “FeatureCollection” and “Feature”, the goal of the FeatureCollection element is to hold all Feature elements: Our features are either buildings, parks, lakes, sights, or squares. Therefore, we have defined a required additional attribute to the “GeographicFeatureType”, which forces the GML instance author to categorize a feature as such:
ICCHP 2006
7
... There are several standard object properties that we can assign to each feature. For example, we can assign a name and a description to a feature. However, this is not mandatory. We can also specify the bounding box of the feature if we wish. For our purpose, however, the really important information is the definition of the features as polygons and their vertex coordinates as well as their “featureType”. The GML location element allows us to specify the location of a feature as a polygon element, whereby the vertex coordinates are given. Features can also be points, curves, multi-point lines, as well as more general objects. Therefore by using this schema, we are able to add new geographic elements easily. In the gml:Polygon element, we can specify the exterior using the “gml:exterior” element. A polygon exterior consists of an exterior linear ring. The coordinate pairs of this ring are the vertex coordinates of the polygon. Thus, there must be at least three vertices, with the last pair of coordinates being the same as the first. The coordinate pairs are also referred to as control points of the linear ring. We can specify these control points with the “gml:pos” element, whereby we enter the coordinate pair separated by a whitespace. For example the description of a building would look like this: 619 209 643 125 706 84 716 99 677 228 619 209 In this case, the coordinate pairs are simply the pixel coordinates of the bitmap image. GML provides support for various coordinate systems, so that instead of these pixel coordinates we could later use Gauss Kr¨uger coordinates.
5
Results and Future Work
In this paper we present a promising approach for improving the accessibility of maps. We show that automatic methods can be applied to extract geographic information from
8
Mirko Horstmann et al.
maps, which leads to a semantic annotation. We are using the open standard GML to describe the extracted semantic information in a structure, which can be transformed to auditory or haptic presentations. Our solution demonstrates a high potential in helping people with special needs to access maps in bitmap format, which is the most used format on the Web. Our future work will concentrate on two aspects: First, the exploitation of other image features like text or symbols to extract additional kinds of information. This will not only extend the amount of useful information extracted from city maps but it will also allow us to investigate the usefulness of our approach for other kinds of maps that do not as much rely on colour codes. Experience in image processing shows that it is not to be expected that more complex information can always be extracted automatically, so a semi-automatic process should be aimed at. Second, a closer integration of the methods into an existing workflow. This will allow a content creator to make use of the methods in the larger scope of accessible map creation.
Acknowledgement This paper is supported by the European Community’s Sixth Framework Programme, FP6-2003-IST-2-004778. We would like to thank Alexander K¨ohn for his support during the development of the software prototype. The prototype uses OpenCV, the Open Source Computer Vision Library (http://www.intel.com/technology/computing/opencv/ index.htm).
References 1. Gallagher, B., Frasch, W.: Tactile acoustic computer interaction system (TACIS): A new type of graphic access for the blind. Technology for Inclusive Design and Equality Improving the Quality of Life for the European Citizen, in Proceedings of the 3rd TIDE Congress, Helsinki, June 1998. 2. Parente, P., Bishop, G.: BATS: The Blind Audio Tactile Mapping System. In Proceedings of the ACM Southeast Conference (ACMSE ’03), Savannah GA, March 2003. 3. Horstmann, M., Lorenz, M., Watkowski, A., Ioannidis, G., Herzog, O., King, A., Evans, D. G., Hagen, C., Schlieder, C., Burn, A.-M., King, N., Petrie, H., Dijkstra, S., Crombie, D.: Automated interpretation and accessible presentation of technical diagrams for blind people. In New Review of Hypermedia and Multimedia, Volume 10(2), pp. 141–163, December 2004. 4. Becker, H.: Automatische Extraktion von Szenentext. Diploma thesis, Universit¨at Bremen, 2005. 5. Miene, A., Hermes, T., Ioannidis, G.: Extracting Textual Inserts from Digital Videos. In Proc. of the Sixth International Conference on Document Analysis and Recognition (ICDAR’01), Seattle, Washington, USA, September 10–13, 2001, pp. 1079–1083, IEEE Computer Society, 2001. 6. Espinosa, M. A., Ungar, S., Ocha´ıta, E., Blades, M., Spencer, C.: Comparing methods for introducing blind and visually impaired people to unfamiliar urban environments. In Journal of Environmental Psychology, 18, pp. 277-287, 1998.