Geo-Enabling Spatially Relevant Data for Mobile Information Use and

0 downloads 0 Views 334KB Size Report
GPS); points of interest (POI) search, both spatially and using free text; access to fauna and flora information, including multimedia data; context aware retrieval ...
Geo-Enabling Spatially Relevant Data for Mobile Information Use and Visualisation Alistair J. Edwardes1, Dirk Burghardt1, Eduardo Dias2, Ross S. Purves1, and Robert Weibel1 1

Department of Geography, University of Zurich Winterthurerstrasse 190, 8057 Zurich (Switzerland) {aje, burg, rsp, weibel}@geo.unizh.ch http://www.geo.unizh.ch/gis 2 Spatial Information Laboratory Free University Amsterdam [email protected]

Abstract. This paper addresses the methodological issues of developing and visualising added value geo-enabled content for mobile information systems. The research was carried out under the framework of WebPark, an EC-IST R&D project that developed a location aware application for nature/protected areas. The evaluation of existing information sources – tourism information, research data, and multimedia content – revealed that the tourism information and multimedia content analysed did not have an active geographic component and the geographic research data (e.g. animal counts/observations) had a clear mismatch as regards the visitors’ information needs. Therefore, different types of data processing were developed and performed in order to render existing information sources useful for location-based services. Data preparation also included building hierarchical data structures (quadtrees and hierarchical stream ordering). The paper shows how these data structures are exploited to facilitate real-time generalisation in order to efficiently present thematic point data on portable devices. Keywords. mobile information services, LBS, environmental information, geoenabling, visualisation, hierarchical data structures, tessellations, real-time cartographic generalisation

1 Introduction Protected areas are created first and foremost with the aim of conserving natural heritage and secondly for supporting the leisure or tourism industry. In this regard, environmental education is, for most protected areas, a major mandate [7]. Mobile technology is becoming increasingly available and its usage is nowadays widespread [2]. Therefore, mobile Internet devices with geo-location capabilities may create an opportunity for meeting the present information needs of visitors to natural areas. Location-based services (LBS), as an example of such services, can play a role in helping visitors of natural and protected areas (e.g. national or regional parks) achieve full

awareness of the richness of natural and cultural resources, improving awareness levels and contributing to environment-friendly visits. However, although the technology is mature enough to deploy mobile and context aware applications, most of the data available at sites is not ready to be used in such applications. This paper investigates this issue and addresses the processing and management of existing information sources in order to create added value, geo-enabled content for LBS to facilitate the provision and visualisation of information in natural areas. The paper builds on experiences made in WebPark, a European IST project that developed a series of location-based services for users of protected areas (for information and sample pictures of the system, see www.webparkservices.info). The services are based on wireless technology and available for mobile phones and PDAs. The prototype system has been developed at two partner sites, the Swiss National Park (SNP) and the Wadden Sea Islands (NL), and tested with real visitors [8]. The system includes the following features [9]: Automatic location of a user on a map (via GPS); points of interest (POI) search, both spatially and using free text; access to fauna and flora information, including multimedia data; context aware retrieval of information, respecting visitors’ spatial behaviour and personal preferences; dynamic mapping of retrieved information including real-time map generalisation; the ability for visitors to add their own geographic data as ‘geographical bookmarks’; and location-sensitive warnings (e.g. areas with increased protection level). The system uses a wireless communication between the server and mobile devices (PDAs). However, since the system is used in rural areas with only partial coverage for wireless communications, client-side caching is used to ensure a continuous visitor experience.

2

Overview of Data Preparation and Management

The WebPark system and related processes can be simplistically viewed as a publishing tool that allows intense information sharing of local knowledge with the visitors. In order to provide true added value for the visitors, the information available plays a crucial role. Hence, GI and multimedia content needed to be adapted or created in order to meet the required accuracy and expectations of the visitor. The GI content needed for the WebPark service could be divided into the ‘background’ and ‘foreground’ types. Typical background GI consists of topographic base map data e.g. roads, paths, coastlines, water features and boundaries; false colour imagery classified by land cover; terrain information and public service and safety information. By contrast, foreground GI contains processed and interpreted GI and multimedia such as animal distribution, POIs, flowers in blossom and up-to-date photographs and other multimedia information. In order to prepare the GI content for WebPark, an extension to the ETL (Extract Transform Load; [15]) process was defined. Figure 1 illustrates the extended process.

Fig. 1. Data processing flow in WebPark

The workflow can be considered a linear process that starts with the determination of available datasets (information audit) and ends with the display of information to visitors. It can be considered an ETL process, where the data is extracted from its original storage medium, transformed to cope with the specified data model, and loaded into a relational database. The extraction step also involves the analysis of the data in order to select a spatially and semantically relevant subset. The selected data is then transformed through a set of operations necessary to harmonise and standardise the data according to the WebPark data model. Depending on the dataset these will include: • Obtaining the copyright or the permission of usage for the source data • Reprojection – spatial data is projected into a single coordinate system • Model generalisation – applying geometric processes such as line filtering and aggregating features that are too small or defined with semantics that are too detailed • Remodelling – harmonising the sources’ data models with the WebPark one • Reformatting – converting the heterogeneous data formats (mime-types) for multimedia native content into the formats understood by the WebPark components. • Translation - WebPark information is provided in English, French, German, Italian and Dutch. • Geo-enabling – associating a geographically-sensitive footprint to data The two processing steps that are methodologically most demanding and interesting – geo-enabling and generalisation – will be described in more detail in sections 3 and 4, respectively. For more details on the other processing steps, see [8]. The transformed data are loaded and stored in a relational DBMS (Oracle) where they can be accessed via web services interfaces (Web Feature Server (WFS) and Web Map Server (WMS)). The WMS allows retrieval of background maps and also specific maps linked to foreground information but containing relatively static con-

tent, e.g. seasonal animal distributions. The WFS enables access to the features in the format used by applications (GML). The client application caches this data in its file management system (FMS) and this data is used to provide foreground information in response to visitor interaction (e.g. on demand searching and mapping of POIs related to a visitors context). [8] provides details about the data models and processing interfaces used in the WebPark system.

3

Geo-Enabling Data for LBS

One of the main conclusions of the initial information audit was that there was a clear mismatch between the information currently provided by the park and the visitor information needs [6], [1], [8]. Most of the questions from visitors had a high spatial component (e.g. “What animals can I see around me?”, “Can I make a picnic here?”), while the information audit revealed that the available information did not have, or had a very limited, spatial component. The information audit also revealed that the GIS research data was not structured in an interesting way from the visitors’ perspective. A fundamental issue during the transformation process was therefore how to make these data suitable for use in an LBS. 3.1 Remodelling Scientific Data for Species The method to organise flora and fauna data made a separation between feature entities (i.e. the species) and their spatial occurrence [8]. Spatial occurrence was modelled with geographical units based on factors such as habitat suitability or historical and recent observations. Alternatively, they were based on human factors related to the perceptual scope of a visitor, for example the landscape visible to them from their location in the case of large animals, or the region directly neighbouring the path they are on in the case of small animals and birds or plants. The separation between species and spatial occurrence was made because: • The amount and type of data available on spatial occurrence of species differed markedly between regions and species. Hence the system needed to support a wide range of different ways of describing the likely spatial occurrence of species types (e.g. habitat suitability, field observations, density/distribution models, expert knowledge, home ranges etc.) • Depending on the service, human factors were often more important than inventory. For example, to provide information to support observing animals it was important to consider constraining factors such as the actual visibility from a visitor’s location taking into account the canopy cover, topography and the proximity to a trail. This helped overcome the limitations of pure radial based searching, which is ignorant of such obstructions. • The applications needed to run on the device when it was offline using cached data. This meant that data storage on the device needed to be managed. Using geo-

graphical units that were distinct from the features they aggregated meant that data required to perform spatial searching could be reduced. • The presentation of data on maps could be processed differently from the data required to index the data to support spatial queries. Map data could be optimised for cartographic considerations (for example using cartographic generalisation) and indexing data optimised to support efficient searching, for example by using highly simplified spatial footprints. Figure 2 illustrates the modelling of visibility regions in the mountainous terrain of the SNP. This involved computing the drainage morphology for stream channels at different stream orders [3]. The resulting hierarchy of regions gave an approximation of the area visible from a point at different scales. These regions were overlaid with forestry to take into account visual obstruction from vegetation. Spatial Context Modelling Visual access regions at different spatial resolutions are derived using terrain and river base data.

Fig. 2. Hierarchical stream basins and visibility regions

Figure 3 describes how information on the occurrence of different types of bird was remodelled by aggregating information from models of habitat preference in the Swiss national park [10] to sections of walking trail.

45 Bird codes (WAP, MID, FEL, etc) 1 indicates presence 0 absence bird_probability Geometric (Re)modelling

Polygons for habitat suitability

BirdsFootPrintRemodeler

Semantic (Re)modelling

gid 1 2 3

geom **** **** ****

WAP 1 0 0

MID 0 0 1

FEL 0 0 1

FootprintAttributesWriter

Birds_on_route gid 1 2 3

Lines for trails Trail intersected with habitat geometries

geom **** **** ****

FIDS [312,344,544, [373,432,111, [433,323,332,

Database table aggregating geometries and bird species

Fig. 3. (Re)modelling process for birds

3.2

Geo-Enabling Multimedia

Various sources of information well suited to visitors needs were available as multimedia in digital form. For example, the SNP produces an interactive CD-ROM containing over 800 high-quality photographs as well as detailed texts with numerous internal and external links, videos and sounds. It was possible to extract this information and relate it to new or existing feature entities in the data model. However, often these did not have a specific geographic footprint (i.e. they were not geo-referenced). As a result, it was necessary to add a spatial component, allowing the vast amount of rich multimedia information to be filtered based on a visitor’s location or sorted by proximity. Geo-referencing was performed with the help of expert local knowledge. The hundreds of features extracted from the CD-ROM were divided into 3 groups, depending on their geographical effectiveness: 1. Non-geographic. Information not linked to a specific location but that should still be available through the system (e.g. general rules of the park, history); 2. Geographic. Information directly linked to a physical location (e.g. hotel, vista spot). This information can be precisely portrayed by the mapping service. 3. Semi-geographic. Information not directly linked to a precise location but that should be available in certain areas. For example, there was no observations dataset for marmots in the SNP research database that could be used to geo-reference multimedia information. Therefore, areas where tourists should be on the lookout for these animals were digitized in consultation with experts. However these areas were only used for spatial indexing the multimedia and not for portrayed on a map since they could be misinterpreted as being exact locations for the animals.

4

Design Considerations for Visualisation

In LBS information reduction is an important issue, for two main reasons. First, network bandwidth and client computational power impose limitations on the amount of information that can usefully be transmitted and processed. Second, the small screen size and resolution of portable devices imposes restrictions on the information content and density that can be portrayed in order to maintain the readability of visualisations. The first limitation is mainly countered in WebPark by the model generalisation step during data preparation (section 2). Model generalisation entails filtering and aggregating geometric as well as semantic information detail in a statistically controlled fashion [16]. In WebPark, this process is run server-side and off-line (i.e. precomputed once before run-time). Different levels of detail are generated for the background data that can be associated with different target map scales. As the required procedures are relatively standard in GIS, they will not be discussed further. For a similar approach, see [4]. Developing methods to counter the second limitation (i.e. limited screen size and resolution) is more demanding. While the detail of the (static) background data layers can be tailored to the small screen size during model generalisation and stored in a multiresolution database containing the right levels of detail for the target map scales, the portrayal of foreground data cannot be pre-computed and requires client-side, real-time processing. The reason being that the foreground thematic data which is to be visualised is usually retrieved as the result of a specific ad hoc query and therefore can not be known in advance and pre-compiled in a map. In the remainder of this paper we present generic techniques for real-time cartographic generalisation of thematic points-of-interest (POIs), using the example of data representing animal locations (coordinates with related attributes) collected in the SNP [9]. The techniques make use of hierarchical data structures, that is, quadtrees as well as the hierarchical stream basins developed in the data preparation process. We can identify three sets of constraints for delivering visualisations of thematic data to users in real-time: • the visualisations should be cartographically acceptable; • any generalisation operations should be performed in ‘real-time, that is, the user's wait for a response should be minimal; and • visualisations of thematic data should maintain meaningful relationships. Since the data we are dealing with are simple point data, a minimal set of cartographic constraints is sufficient to maximise legibility on small devices: • Minimum symbol size should maintain legibility – a common value is 5 pixels [13]; and • symbols should not be allowed to overlap or touch.

5

Using Spatial Tessellations for Real-Time Map Generalisation

3.1

Using Quadtrees

We use a quadtree to tessellate space [12], allowing rapid selection and exaggeration of points at different zoom levels. Moving vertically through the quadtree is analogous to zooming, moving horizontally between nodes is comparable to panning. Each node contains a count of the number of objects existing within the block it represents. The count at the nodes for the current zoom level is used to generate a visualisation using simple selection and exaggeration rules (Fig. 4, lower part). The quadtree tessellates space until every point is assigned to a separate block. When zooming a level is chosen that meets a minimum acceptable symbol size criterion. The upper part of Figure 4 illustrates the results of zooming/generalisation through the quadtree, where maximum symbol size is constant, and symbol size is a function of the number of observations. Symbols are also displaced, within the quadtree block, toward the centre of gravity of the block’s observations.

Fig. 4. Lower part: Example point set and corresponding quadtree (note the count stored at each node). Upper part: Successive generalisations of deer observations.

The quadtree provides an efficient implementation which allows the application of simple rules to generalise thematic point data for panning and zooming whilst meeting basic cartographic constraints. However, points are aggregated according to their quadtree block, with no consideration given to relative adjacency to other points or underlying controls on point distribution. 3.2

Using Hierarchical Stream Ordering

The above example showed how a hierarchical data structure could be used effectively to generalise point data at different levels of zoom in real-time, by using an arbitrary regular tessellation. The main problem with this approach is that its representation of geographic relationships is limited. To describe more meaningful rela-

tionships between data, such as first order (e.g. an animal and its habitat) and second order (e.g. the arrangement of animal territories) spatial relations [11], [5], we must first establish hypotheses as to how data may be related. Here a simple conceptual example is presented where we postulate a relationship between deer and the landscape. Deer in mountainous terrain may be unable or have difficulty to cross steep ridges, and thus individuals in one valley may be more likely to have some relationship with each other than spatially nearer deer in separate valleys. We therefore tessellate the point dataset in Figure 4 using Strahler ordering and watersheds [14]. Resultant tessellations and visualisations are shown in Figures 5 and 6. The coarsest subdivision produces the same results as the quadtree in this case, whilst the next two zoom levels produce different results in terms of relative symbol size and symbol numbers.

6

Conclusions

This paper reported on results of the WebPark project and had two main objectives. First, it sought to underline the importance of geo-enabling spatially relevant information in order to get the most out of existing information sources and meet the requirements of the user for location and context-dependent information provision. The analysis of information needs in WebPark revealed a mismatch between the existing information and the visitors’ needs for geo-enabled information. Typical LBS in the environmental domain can be expected to build on similar types of existing data (multimedia content and research observations) in order to minimise cost for data capture and integration. Hence, the lessons learned as well as the methods developed in WebPark can be generalised to similar contexts. The same holds true for the second objective of this paper, the development of techniques for real-time generalisation of thematic point (POI) data that were implemented in the project, as these methods are based on generic principles of tessellation hierarchies. The quadtree-based approach provides an efficient means of panning and zooming through multiple layers of data on small devices, but the regular tessellation of space pays no attention to preserving meaningful relationships within thematic data. To preserve such relationships we must form hypotheses about the nature of spatial relationships within the data and tessellate space accordingly. An example illustrating such an approach assumes that watersheds derived hierarchically using Strahler ordering provide a meaningful way of tessellating a landscape with respect to deer observations. Although the WebPark project is officially completed, the system continues to be developed and marketed through a spinoff company (www.camineo.com). This will not only allow further technical development of our real-time visualisation techniques but also further usability testing and cognitive experiments.

Fig. 5. Stahler stream ordering, and tessellation using watersheds of third (level 0), second (level 1) and first order (level 2) streams, respectively. Note that at every level the entire area is tessellated.

Original ungulate observation data

Fig. 6. Comparison of hierarchical generalisation techniques with the original observation data. Left: Using quadtree. Right: Using hierarchical stream ordering and associated stream basins.

Acknowledgements The authors gratefully acknowledge the support of the WebPark team (www.webparkservices.info), the European Commission for supporting the WebPark research (Project Number IST-2000-31041) and the Swiss State Sectretariat of Eduction and Research (SER) in relation to this project (BBW Nr. 01.0187-1). Eduardo Dias would like to thank the support from the Portuguese National Science Foundation (FCT/MCT) under the PhD grant SFRH/BD/12758/2003.

References 1. Abderhalden, W., Krug, K.: Visitor Monitoring in the Swiss National Park – Towards Appropriate Information for the Wireless Consumer. Proc. 10th Int. Conf. on Information Technology and Travel & Tourism, ENTER 2003, Helsinki, 29-31 January, 2003. 2. Barnes, S., The Mobile Commerce Value Chain: Analysis and Future Developments. International Journal of Information Management 22, pp. 91-108, 2002. 3. Burrough, P.A., McDonnell, R.A.: Principles of Geographical Information Systems. (Oxford University Press: Oxford), 1998. 4. Cecconi, A., Galanda, M.: Adaptive Zooming in Web Cartography. Computer Graphics Forum, 21, pp. 787-799, 2002. 5. de Berg, M., Bose, P., Cheong, O., Morin P.: On Simplifying Dot Maps. Computational Geometry, 27, pp. 43-62, 2004. 6. Dias, E.; Gonçalves, A. and Rodrigues, A., Evaluation of Existing Information Services. WebPark report WP2100, EC Project Number IST-2000-31041, 2002. 7. Dias, E.; Beinat, E. and Scholten, H., Effects of Mobile Information Sharing in Natural Parks. Proceedings of ENVIROINFO 2004 vol. 2, 18th International Conference Informatics for Environmental Protection. Tricorne Editions, Geneva, pp. 11-25, 2004. 8. Dias, E., Edwardes, A.J.: Information Flows in Nature Areas – Information Needs and Data supply for Location-based Services in Nature Areas. Proceedings GISPlanet 2005, Lisbon (CD-ROM), 2005. 9. Edwardes, A., Burghardt, D., Weibel, R.: WebPark – Location Based Services for Species Search in Recreation Areas. In Proceedings of the 21st International Cartographic Conference (Durban: ICC), CD-ROM, 2003. 10. Filli, F., Haller, R., Moritzi, M., Negri, M., Obrecht, J.-M., Robin, K., and Schuster, R., “Die Singvögel im Schweizerischen Nationalpark: Verbreitung anhand GIS-gestützter Habitatmodell”, Jahresbericht der Naturforschenden Gesellschaft Graubünden, Chur, Band 109, S.47-90, 2000. 11. O'Sullivan, D., Unwin, D.J.: Geographic Information Analysis (John Wiley), 2003. 12. Samet, H.: Applications of spatial data structures (Addison-Wesley: New York), 1990. 13. SSC: Topographic Maps – Map Graphics and Generalization. Cartographic Publication Series, Vol. 17 (Swiss Society of Cartography: Bern), CD-ROM, 2005. 14. Summerfield, M.A.: Global Geomorphology (Longman: London), 1991. 15. Vassiliadis, P., Simitsis, A. and Skiadopoulos, S., Conceptual modeling for ETL processes, Proceedings of the 5th ACM international workshop on Data Warehousing and OLAP. ACM Press, New York, NY, USA, 2002. 16. Weibel, R.: Generalization of Spatial Data – Principles and Selected Algorithms. In: van Kreveld, M., Nievergelt, J., Roos, T., Widmayer, P. (eds.): Algorithmic Foundations of Geographic Information Systems. Lecture Notes in Computer Science, Vol. 1340, Springer Verlag, Berlin et al., 99-152, 1997.