Making GIS closer to end users of urban environment data Andrea Aime Environment Protection and Planning, Municipality of Modena
Flavio Bonfatti and Paola Daniela Monari Department of Engineering Sciences, University of Modena and Reggio Emilia
email:
[email protected] ABSTRACT A serious obstacle to GIS technology full deployment is the distance that still remains between it and the culture of the geographic information (GI) user. This is focused on spatial data and the operations to perform on them, not on how they are managed by a specific GIS package. The last generation tools have reduced but not eliminated this gap. The problem is faced within the ISOLA European project, funded in the framework of the LIFE/Environment programme. The aim is developing a client-server architecture that neatly separates the user interface from the underlying GIS engine. The former provides editing and visualisation functions that allows the user to express requests in a high level language. The latter provides a set of primitives that are invoked by the interpreter of the user commands. Relevant points of the adopted approach are a graphic process description language to formalise the user-defined computation steps, a conceptual representation of spatial data that makes the user unaware of their physical structure, a substantial independence of the GIS engine technology.
Keywords End-user, process independence.
1
model,
spatial
analysis,
technology
INTRODUCTION
The three-year long ISOLA project (Information System for the Orientation of Local Actions) [10] is aimed at studying a unified way to organise and process urban environmental data in three fundamental activities [5]: • Eco-balance. Assessment of risks in the urban environment and its components, and measurement of the pressure exerted by human activities and settlements. • Eco-plan. Integration of environmental factors in urban planning according to proper methods, criteria and procedures. • Eco-management. Support to decision-making by means of simulation tools that allow evaluating the impact of alternative scenarios.
The project intends to extend and generalize the experience acquired by the Modena municipality in several years of work in this field, with the objective to set-up an integrated approach that can be replicated in urban contexts differing for geographical, cultural and political conditions. The expected project results are: • A sound methodology, expressed in form of guidelines addressed to the end users that plan to adopt a disciplined way of facing the environmental problem. • A software package to support the methodology and make its application directly feasible and controlled by end users not expert in GIS technology. It means splitting the package into a high-level user interface and an underlying GIS engine, and hiding the mapping between user requests and calls to the engine functions. • An experiment, obtained by applying the methodology to a real case. Experiment goals are to validate the methodology, test user friendliness and performance of the ISOLA package, and provide material and application examples for dissemination actions. Through a preliminary investigation on geographic information (GI) user practice and needs [3] it was found that the map construction (derivation) process represents for the user a critical problem, since it is directly related to the way the user interacts with the spatial information. Most applications are classified by their users as decision support or procedure automation, and deriving new maps from the available spatial data is often the main application achievement. Furthermore, most users consider layers (coverages) as the most important spatial objects to deal with in territory representation and analysis, and assign a high rate, among non-functional requirements, to user-friendliness of the GIS applications. Going deeper into the map derivation process it can be observed that the GI user requirements for a new application are essentially expressed through the views he/she has of the territory in terms of basic and derived maps [12]. In other words, designing a typical spatial analysis application (with its data and operations) corresponds, for the user, to defining a computation process necessarily based on the map metaphor. Thus, the map set actually becomes the database and works also as input data for the next analysis, to enable more advanced geo-processing applications. These considerations, which are confirmed by the ISOLA methodology, lead to the following main requirements for the software package:
• The GI user must be put in condition to express directly the computation process through an intuitive high level language. • A conceptual representation of spatial data must leave the user unaware of their physical structure and coding. • Process description must remain separated from process execution so as to allow a late binding to the data set. • Process description must also be independent of the underlying GIS technology in order to make the GIS engines interchangeable. The literature reports a limited number of studies in this field. All they share the idea that the GIS technology is too far from the common technical culture of the “naive user” [7] but the pursued aim differs case by case. The proposal in [11] deals with the automatic generation of end user applications for several GIS platforms by means of the AIGLE visual CASE package. On the other hand [6] considers the possibility of helping end user to work with different GIS tools by facing the problem at the conceptual level with the support of the UAPÉ environment. Besides trying, as we do, to become independent of the GIS technology, the two studies focus, respectively, on GUI and database design, while we are interested to support the map derivation process. This paper is aimed at presenting the main features of the ISOLA package with particular attention to the process definition and process execution phases. Section 2 introduces the adopted process model, an application example of which is proposed in Section 3. The process definition and execution procedure is discussed in Section 4, while Section 5 draws some conclusive remarks.
2
THE PROCESS MODEL
A process is here intended as an ordered sequence of operations aimed at manipulating spatial data with the purpose of extracting additional meaning as a result. Data and operations, and the links that establish the operation execution order, are the basic process concepts. In other words, the process can be imagined as an oriented, acyclic digraph whose nodes are, alternatively, spatial data and operations. The graph extreme nodes are spatial data, in that every process starts from its input data and reaches the resulting output data. The process model is a set of primitives that allows the user to express his/her intentions at the conceptual level, that is, without being distracted by implementation issues. In defining the process model we take advantage of the experience gained in contexts, such as manufacturing and business engineering, where the end user is normally asked to represent, and establish a partial order between, the activities to perform [4]. This section introduces the process model on which the ISOLA approach is founded. Spatial data visualisation, even though present in the ISOLA process model, is not considered in this paper, as it does not introduce significant innovation.
2.1
Data types
The spatial information on which the ISOLA package operates is organised into layers, a concept quite familiar to GI users. Every
layer collects and geo-references, in a proper form, geometrically and semantically homogeneous territory entities. More precisely, the knowledge brought by a layer is split into two distinct components (G; {Ti}): • G is the geometric component, localising the spatial entity on the territory (position, shape, orientation) • {Ti} is the thematic component, expressing in form of attribute values Ti the properties of each layer entity. Depending on the nature of the layer entities and the representation mode of their geometry, the process model classifies the layers into four types: point layer (P), line layer (L), region layer (R), cell layer (C). The last type corresponds to the raster representation of the layer geometry, while the others refer to the vector representation. A parameter of the cell layer specifies the cell size. Although not completely aware of the deep implications of these coding techniques, the GI user have often a clear idea of their most convenient use. The graphic symbol for layer is reported in Figure 1. Every layer is defined by the type (P, L, R, and C) and the name. The name is unique within the process and mandatory for the input and output layers, and for the intermediate results the user intends to recall, while the other layers can be represented by the type alone. R: Parcels
L: Roads
C
Figure 1 – Layers In general, a vector layer has associated zero, one or more attributes, while a cell layer is always mono-thematic. When present, the layer thematic component is represented in form of attribute table, where the attribute types refer to the usual elementary domains, such as integer, real, boolean, enumeration and the like. The attribute table has no graphical representation but is compiled by proper dialog boxes of the editor interface. From early validation sessions with GI users it results that all the above definitions are easily understood and often already known. On the other hand, they constitute the minimum set required to satisfy most end user informational needs. Other data types, such as time series and samples, and the relative operators, are not considered in this paper for the sake of simplicity, although they are present in the process model.
2.2
Operators
Lists of fundamental map analysis operators are available in the literature [2]. Here the process model operators are defined with the objective to make them easy for the end user to understand and apply. For instance, we include operators that perform simple interpolations, forecasts or pattern recognitions, but leave aside their sophisticated counterparts. A further simplification is obtained by limiting the variety of operators without, nevertheless, reducing the model expressive power. This is done by means of: • Polymorphism. The same name denotes operators that perform the same kind of operation on different data types.
For instance, the buffering operation applies to points, lines and regions, through diverse algorithms but with the same objective. Thus, the model provides a unique BUFFER operator. • Parameterisation. It means specifying the operator behaviour by choosing proper options. For instance, a BUFFER operator execution can be straightforward or conditioned by a look-up table to indicate the buffering width with respect to the value of one or more attributes of the input layer. Every operator has one or more input layers and, usually, one output layer. Role and type of the involved layers are characteristic of the single operator, as depicted in Figure 2 where the CMP operator is shown in connection with possible input and output layers (the execution flow is intended from top down). C: Slope
C: Vegetation
A
B
unique within the process, together with the role names of the input layers (e.g. A, B, C) and output layers (e.g. OUT). Possible parameters are instead expressed in textual form (dialog box in the editor interface) as in the example reported in Figure 4. According to the nature of the processed layers we identify three categories of operators: • Vector operators. They apply to vector layers and produce vector layers. They perform overlay, buffering, selection of spatial or thematic properties, computation of new attributes, reclassification. • Raster operators. They apply to cell layers and produce cell layers. They perform the operations of the raster algebra, interpolation, slope computation, filtering. • Conversion operators. They transform vector layers into cell layers (according to different techniques), cell layers into vector layers, and cell layers into other cell layers with different cell size.
R: Erodibility
Table 1 reports a sample of the model operators, with the indication of the operator type, the category (v for vector, r for raster, c for conversion), input and output layer types and roles, and possible parameters. Layer types are P, L, R, and C, but also V for P+L+R and X for R+C.
C
CMP: COMP01 OUT
2.3
R: Erosion
Figure 2 – Operator with the involved layers The operator graphic symbol is inspired to the syntax of Function Block Diagrams [8] as it is proved to be easily understandable by unskilled end users. It reports the operator type (e.g. CMP) and the operator name (e.g. COMP01), which is mandatory and
Macro operators
The GI user is allowed to extend the operator set by adding macro operators. A macro (compound) operator is a user-defined process the user denotes with a type identifier and characterises in terms of roles and types of the input and output layers. The expressive power of macro operators arises from the possibility they offer to reuse processes, or portions of processes, in different user-defined applications. It means introducing the idea
Type
Ct
Description
Inputs
Output
Parameters
OVL
v
Topological overlay between two vector layers
IN: V RE: R
OUT: V
Non-overlapping entities in OUT
(same as IN) Selection of thematic data in OUT
BUF
v
Generates buffers from vector entities
IN: V
OUT: R
Buffering distance or look-up table
SEL
v
Selects entities according to a predicate based on thematic data
IN: V
OUT: V
Selection predicate
SSL
v
Selects entities from IN according to a topological IN: V property related to the entities of REF REF: V
OUT: V
Topological property: partly
(same as IN) overlapped or entirely within or
contains entire CMP
v (r) Computes a new attribute (a new cell layer) according to a given rule. It applies to the attributes of a vector layer (a set of cell layers, raster algebra)
IN: V,{C}
OUT:V,C
Computation rules, like simple formula, conditioned expression, ranges of a continuous value, weighting of a discrete value, …
SLP
r
Calculates a slope map from a DEM
IN: C
OUT: C
Output in degrees or percentage
INT
r
Simple interpolation based on weighted distance of sample points
IN: P
OUT: C
Attribute to interpolate
P2C L2C R2C
c
Point (line, region) to cell conversion operator
IN: P IN: L IN: R
OUT: C
Attribute to be used for conversion How to handle the points (arcs, regions) affecting the same cell
CNT
c
Contouring algorithm extracting contour lines
IN: C
OUT: L
Contour line steps
Table 1 – Sample of model operators
of modularity that gives the user a disciplined mental model and shortens development and testing times. The graphic symbol for macro operator is very similar to that of simple operator, as it is shown in Figure 3. In the example, the type is PotErod, the input roles are Vegetation, DEM and Erodibility, and the output roles are SlopeClass and SoilErosion. When used in a process, the macro operator is instantiated by its name, mandatory and unique, and linked to the actual input and output layers, as in the example. R: NewVeget
C: DEM
C: Erodibility
The benefits we expect from this process graphical representation, which is under evaluation by the ISOLA project user interest group, are several. Among them we recall: • Full view of the process. Process structure, involved layers, precedences and results are immediately perceived as they are pictured in a unique schema. • Easy process construction and change. The process can be simply defined and modified by a graphical editor with the typical office automation functionality. • Deferred process execution. The process can be saved (possibly as a macro operation) and executed many times on different or periodically updated data sets. R: Vegetation
Vegetation
DEM
C: DEM
Erodibility
PotErod: PotentialErodib01 SlopeClass
C: NewSlope
SoilErosion
C: NewErosion
Figure 3 – Example of macro operator
3
A SIMPLE EXAMPLE
A process modelling example is reported in Figure 4. It expresses a territory analysis aimed at computing the soil erosion risk, in form of potential erosion map, from three input maps: • C: DEM, a digital elevation model of the area of interest. • R: Vegetation, a region layer representing classes of land use. • C: Erodibility, a cell layer representing soil vulnerability, somehow calculated (just to keep the example short).
IN
IN
R2C: Ras2cl01
SLP: Slope01
OUT
OUT
C: CVegetation
C: Slope
C: Erodibility
A
A
A
CMP: Comp01
CMP: Comp02
CMP: Comp3
OUT
OUT
OUT
C
C: SlopeClass
C
A
B
C
CMP: Comp04 OUT
The process converts the vegetation layer into the raster format, by the R2C operator, using a cell resolution equal to that of the DEM and the Erodibility layers. In parallel, it calculates the slope from the DEM layer using the SLP operator. Then, the three cell layers are reclassified by as many instances of the CMP operator (raster algebra) according to the reclassification tables reported below. Finally, we obtain the output layer (C: SoilErosion) by executing a weighted overlay of the three intermediate layers. The transformation of this process into a macro operation (the one reported in Figure 3) consists of the following steps: • In order to become reusable, the macro operation must be made independent of the input and output layers that appear in process definition. It means that the macro operation shall include the only portion of the graph in Figure 4 delimited by the dashed line. • In place of the input layers, the macro operation reports their types and the roles they play. These roles can be automatically derived from the names of the process input layers, or renamed by the user. • In place of the output layers, the macro operation reports their types and roles. Note that even intermediate results can be identified as output layers, for instance C: SlopeClass, if it is worth to visualise or use them in other computations.
C: SoilErosion
Comp01 Vegetation VegetClass: Integer 0 Forest RangeLand 1 Irrigated 2 Disturbed 3
Comp03 Erodibility ErodClass: Integer 0,1 0,2 1 0,2 0,35 2 0,35 0,5 3
Comp02 Slope SlopeClass: Integer 0 10 1 11 20 2 20 SUP 3
Figure 4 – The potential erosion computation process
4
PROCESS MANAGEMENT
The ISOLA package supports all the activities that lead the end user from defining a process to executing it on selected layers, up to storing and visualising the achieved results. In this section we describe software functionality and data flows. The system is based on the traditional client-server architecture (Figure 5), where the client has in charge the interaction with the user and the server manages persistent data and executes computations. This separation ensures the complete
independence of the client functions from the underlying GIS technology, and allows implementing the ISOLA environment on top of different GIS platforms. The client side of the architecture includes two main components, namely the Process Editor and the Execution Environment. The former supports the user in process description and update through its graphical interface; the latter translates the process into a pseudo-code, which is subsequently interpreted and sent to the server in form of commands. In turn, the server side realises the Virtual Machine that provides data access and computation functions able to execute the client commands. Process editor
Execution environment
Compiler
Interpreter
PROCESSES Virtual machine DATA
Raster engine
Vector engine
We call abstract this kind of process; abstract is, in particular, the process pictured in Figure 4. An abstract process involves layers that are characterised by local names (mandatory only for the input and output layers) and types (mandatory). In addition, each input layer is characterised by the structure of the relative attribute table, expressed in terms of a set of pairs (attribute name, attribute type). The attribute table structures of intermediate and output layers are easily derived from those of the input layers by taking into account the semantics of the applied operators. However, there are some reasons that make definition of abstract processes difficult or not convenient. Among them we recall: • Not all the GI users are provided with the abstraction capability that is required to define abstract processes, while they are usually in condition to describe operations on clearly identified layers. • Certain processes are defined to be applied to predefined sets of layers, that is, they do not need to be reused in other circumstances and on other data sets. • It is often easier deriving the input layer attribute tables from those of actual layers, stored in the database, that declaring them to the Process Editor. To this purpose, the user is allowed to define processes that already refer to database files, by including their physical identifiers as input layer names and directly adopting their attribute table structures. We call instantiated this kind of process. Instantiated processes can be stored as they are or used as a rough definition from which the Process Editor can automatically derive the corresponding abstract process. Once terminated the editing session, the user-defined process, no matter if abstract or instantiated, is stored in a proper database partition. From this partition the process can be loaded back to be modified, or to derive from it a macro operation, or to instantiate it before an execution.
Figure 5 – The ISOLA architecture
4.1
Process definition
One of the most frequent user requirements concerns the possibility to execute many times the same process on different data sets. Three, at least, are the typical situations that justify this request: • The process is often constructed and tuned on a data set describing a small portion of territory, then it can be applied to the layers representing the whole area of interest. • The process is executed periodically on layers representing the same territory at different times, so as to obtain updated answers. • The process is made available to other organisations that share the same problems, and these organisations execute it on the respective data sets. In order to fulfil these requirements the process definition should be independent of the layers that will be actually processed. It means that the layers mentioned in the process graph should be simply denoted by local names, with no reference to the physical file names.
The permanent form chosen for the process is textual. According to the Function Block Diagram metaphor [8] the process text is made of two sections, namely a declarative section and an algorithmic section. The former introduces input layers, output layers and work (intermediate) layers with their names and type. The latter represents every operator as a call to the corresponding primitive. A simple example is given by: R2C: Ras2c01 (IN = Vegetation) CVegetation = Ras2c01.OUT
expressing the first operator of the process in Figure 4. The text process is obviously enriched by the operator parameters. In addition, the co-ordinates of the graphical symbols are associated to operators and layers as necessary information to build back the graphical representation whenever the process is recalled by the Process Editor for modification purposes.
4.2
Process execution
The user executes a process through the Execution Environment. The process is first recalled from the database and compiled, that
is, translated into a lower level pseudo-code suitable to be interpreted and executed on the Virtual Machine. To this purpose, each process operator is mapped onto one to N Virtual Machine instructions, and specified according to the given parameters. More precisely: • Polymorphism is eliminated by generating those instructions that correspond to the data types of the actual input layers. • Operator parameters indicating a particular algorithm are taken into account by generating those instructions that realise the selected function. The Compiler operates only on instantiated processes, so as to produce a pseudo-code containing all the references to the involved database files. Thus, when an abstract process execution is launched the Compiler asks the user to associate physical identifiers to the input layers and, possibly, to indicate how to store the output layers. Only after this step is concluded, the compilation function starts. Main Compiler controls are: • Ascertain that file identifiers correspond to layers in the database, and that their types and attribute tables are compatible with the process definition. Attribute table compatibility is ensured when the attribute set of the physical layer includes that of the relative input layer: the physical layer must be a specialisation of the relative input layer. • Check syntactic errors, consisting essentially in the possibility that the process has not been completely defined. This occurs when input or output roles of some operators are not yet associated to layers, but the error is fatal only in case of missing input layer. • Check semantic errors. A fatal error is detected if the type of an operator role does not match with that of the associated layer or if the attribute table of an input layer does not contain the attributes the operator works on. • Find out unused attributes. During the compilation phase it is possible to verify whether the attributes of the input layers are all used by the process operators. If some of them result unused the Compiler issues a warning message, even though the compilation ends successfully. The Interpreter role is twofold: scanning one by one the pseudocode instruction to prepare calls to the Virtual Machine functions, and manage the returned information to acknowledge the user on the execution progress or possible problems. The Interpreter is the last of the ISOLA software modules that remains independent of the underlying GIS technology. It is the main task of the Virtual Machine to ensure such independence. The Virtual Machine provides the set of functions that realise the pseudo-code instructions. These functions use the primitives made available by the GIS Engine and hence constitute the software layer that encapsulates the chosen GIS technology. It means that the Virtual Machine is the only code to rewrite for adapting the ISOLA package to another GIS technology. For the sake of generality we indicate the GIS Engine split into two parts, namely the Raster Engine and the Vector Engine, but they could correspond to a unique GIS platform as well as to more than two platforms. For the ISOLA project the GIS engine
is constituted by the GRASS package [9] and an extension to the Autodesk World package [1].
5
CONCLUSIONS
We have presented the main features of the ISOLA package, which is presently under development. The package alpha version will be available by the end of 1999 and validated on the field during the following year. However, a preliminary version of the Process Editor interface is already submitted to usability tests by end users from the Modena Municipality and from other municipalities participating in the project Interest Group. The paper focuses on the idea of process modelling that we consider critical for most GI user activities. The ease of use is pursued through a number of choices: • The model is based on a limited number of intuitive primitives, and adopts a graphic metaphor that is considered suitable for unskilled users. • The model is procedural, as this is the most familiar way of describing actions, precedences and results (and in line with the ISO 900x standard). • Process definition and process execution are kept separated, so as to enforce the process reuse potential. We expect both direct and indirect benefits from the adoption of the ISOLA approach. A direct benefit is the explicitation of procedures that now are hidden in the minds of single experts, and the possibility to transfer, compare and consciously revise them. An indirect benefit is the introduction of a new way of working, which can be extended to other organisational aspects that can take advantage from an explicit representation of the activities to carry out and keep under control.
6
REFERENCES
[1] http://www.autodesk.com/products/world/ [2] J. K. Berry. Fundamental operations in computerassisted map analysis, International Journal of Geographic Information Systems, 1, 2, 1987. [3] F. Bonfatti et al. Guidelines for Best Practice in User Interface for GIS, EP 21580 BestGIS, Marconi, 1998. [4] F. Bonfatti, P. D. Monari, P. Paganelli. Resource free and resource dependent aspects of process modelling: a rule-based conceptual approach, International Journal of Computer Integrated Manufacturing, Taylor & Francis, 1, 1, 1998. [5] F. Bonfatti, P. D. Monari, C. A. Muratori. The ISOLA project: a novel approach to urban environment data use, UDMS ’99 International Symposium, Venice, 1999. [6] J. L. DeOliveira, F. Pires, C. Bauzer Medeiros. An environment for modeling and design of geographic applications, GeoInformatica, 1, 1, 1997. [7] M. Egenhofer, D. Mark. Naive geography, COSIT ’95 International Conference, LNCS n. 988, Springer Verlag, 1995.
[8] FBD language, in IEC 1131-3 Programming Languages, CEI/IEC, 1993. [9] http://www.baylor.edu/~grass/ [10 http://www.comune.modena.it/~isola The ISOLA project Web site. [11]A. Lbath. A CASE tool for urban applications based on visual design, UDMS ’99 International Symposium, Venice, 1999. [12]A. A. Lovett. Spatial analysis, in Geographic Information Systems: volume 2 GIS technology, Frank A. U. Ed., Longman, 1995.