Rhind, Openshaw, and Green [53] identify some of the essential functions in GIS as: input and encoding ...... Springer Verlag Courses and Lectures 347, 1994. 5.
GeoInformatica 1, 29±58 (1997) # 1997 Kluwer Academic Publishers, Boston. Manufactured in The Netherlands
An Environment for Modeling and Design of Geographic Applications  TIMA PIRES AND CLAUDIA BAUZER MEDEIROS JULIANO LOPES DE OLIVEIRA, FA Instituto de ComputacËaÄoÐIC-UNICAMP, 13081-970 Campinas-SPÐBrazil {juliano, fpires,cmbm}@dcc.unicamp.br Received June 13, 1996; Revised October 4, 1996; Accepted October 15, 1996
Abstract This paper presents UAPEÂÐa computational environment for modeling and designing environmental geographic applications. UAPEÂ is aimed at end-users who are experts in their application domain, but who do not have adequate background in software engineering or database design, and thus are unable to take full advantage of available GIS tools. Its goal is to reduce the impedance between the end-users' view of the world and its implementation in Geographic Information Systems. The environment has been designed and implemented so that it can be considered as an auxiliary layer to be coupled to a GIS. The major features of this layer are: it has an open architecture, being independent of a speci®c GIS, so that it can be coupled to different systems; it allows the user to deal only with the conceptual view of the geographic reality, abstracting the implementation details; it supports a geographic application design methodology, fully integrated with a high-level semantic data model, so there is no impedance mismatch between application design and data modeling. Keywords: Geographic Information Systems, database modeling, geographic software design methodology
1.
Introduction
The amount and variety of application domains in Geographic Information SystemsÐ GISÐare growing exponentially, due to the advances in supporting technologies and the decrease of hardware costs. The motivation for the work presented in this paper is that, in spite of this growing demand, existing GIS fail to provide an adequate environment for end-users. More speci®cally, current GIS provide ef®cient operations (e.g., storage and retrieval) on geographic data, but they have major drawbacks on supporting the basic activities (modeling and analysis) from the user's point of view. Research and development involving GIS is mostly of two kinds: that performed by computer scientists, and that conducted by end-users (e.g., cartographers, geographers, social scientists, engineers). Recently, there have been efforts to bridge the gap between these two worlds. This is re¯ected in conferences and research papers which portray joint work, as well as by the fact that major projects involving GIS are highly multidisciplinary. Nevertheless, it is patent that different points of view exist, and that they in¯uence the approach to the problem of processing georeferenced data. In fact, whereas computer scientists often lack a good understanding of end-users' requirements, users seldom have the computer science background to take advantage of the facilities offered by GIS. As remarked by [20], ``most GIS require extensive training,
30
DE OLIVEIRA, PIRES AND MEDEIROS
not only to familiarize users with terminology of system designers, but also to educate them in formalizations used to represent geographic data and derive geographic information''. We claim that most of end-users' problems in using GIS originate from two main issues: *
*
Limitations of the GIS interface and model: There is often a mismatch between the user's vocabulary and understanding of the world and the modeling and development facilities offered by a given GIS. Thus, in order to develop applications or to query data, end-users are forced to somehow distort their view of the world in order to accommodate it to the system's framework, language and data structures. Limitations due to the users' lack of training in Computer Science: Users are experts in their ®eld, but seldom have the proper training either in software engineering or in database modeling and design techniques which would make them able to design better applications, and enhance data and software reuse.
In this paper, we present the architecture of UAPEÂ, an environment that is being built at the Institute of Computing, University of Campinas (IC-UNICAMP), which is intended to diminish these problems, helping the end-user to work with different GIS tools through a coherent conceptual framework. One of the novel aspects of this work is that it integrates geographic data and process modeling, using software engineering and database technologies. UAPEÂ provides users with facilities for application design and geographic data modeling, which are based on combining a high level geographic semantic data model, called GMOD, with an application design methodology. The environment is open and has been designed to be used together with different GIS as a layer between end-user and GIS. UAPEÂ does not yet provide an interface to a commercial GIS; we are now designing one such module. Rather, it is based on a geographic database prototype developed on top of the O2 objectoriented database management system [39]. The ease of coupling UAPEÂ to a GIS depends on the degree of openness of the GIS itself, and is in fact one of the main problems we face at the moment. UAPEÂ is directed towards supporting two types of end-user activity: design of geographic (environmental) applications and databases; and manipulation of the data stored in these databases. Using this environment, end-users will be able to design applications geared to their needs, without having to learn GIS implementation concepts. The design facilities of UAPEÂ guide the user through a series of steps that ensure proper documentation of design decisions, and allow reuse and integration of existing data sources. These steps are directed by a methodology for designing environmental applications which has been developed at UNICAMP and validated by end-users, for distinct application domains, over the last year. The activities of data manipulation are supported by a high level semantic data model, also used by the design methodology, which presents to users a view of the world which is closer to their reality. This model has also been tested against user needs for the last two years, having been used in the de®nition of different geographic databases.
AN ENVIRONMENT FOR MODELING AND DESIGN OF GEOGRAPHIC APPLICATIONS
31
The name of the environment re¯ects its functionalities. UAPEÂ is an Indian word for a water lily of the Amazon area, under whose huge leaves thrives a complex ecosystem. Analogously, UAPEÂÐgeo-User Analysis and Project Environment presents to its users a nice interface to help manipulate the complex structures of GIS. The remainder of this paper is organized as follows. Section 2 presents some of the concepts used throughout the paper, as well as describes some of the main activities that GIS must support, showing existing approaches to perform these activities. Section 3 introduces UAPEÂ, ®rst describing the data model and the methodology guidelines, and next presenting its architecture. Section 4 shows, through a real application, how it can support the activities presented in Section 2. Section 5 presents the conclusions of this work, discussing the status of the present implementation and research activities involving UAPEÂ. 2.
Modeling and analysis activities in current GIS
The problems of discussing GIS begin with de®ning the term itself (e.g., [5], [17], [37], [55]). There are countless de®nitions for GIS, each based on the type of user and application domain [37]. The more general de®nition [26] would be ``a digital information system whose records are somehow geographically referenced''. The database approach de®nes GIS as a non-conventional geographic database that ``supports management of spatial data''. The toolbox view considers GIS to be a set of tools and algorithms to manipulate geographic data. The process-oriented view sees GIS as a collection of integrated subsystems, where data go through a sequence of transformation processes. Finally, the application or utilization de®nition considers GIS according to the kind of problem solved and data type manipulated. Depending on the de®nition, different issues are considered, re¯ecting the multiplicity of possible uses of GIS technology. This paper combines database and process-oriented views and takes into account the following properties: GIS perform data management and retrieval operations for georeferenced data, which is time and space speci®c; the data that must be integrated into GIS comes in distinct formats, from different sources and geographic locations, and is captured by various types of devices; it occupies considerable amounts of space and requires specialized analysis and output formatting operations. Fundamental to our point of view is the fact that geographic data are stored in a geographic database. In fact, end-users assume a database to be a fundamental part of GIS, but their de®nition of what is a ``database system'' is very fuzzy. A geographic database is a repository of information collected empirically about the real world phenomena [29]. The creation of a geographic database goes through several stages: modeling and design; collecting data about the phenomena identi®ed as relevant during design; correction of errors introduced during data collection; data georeferencing. Once the database has been created, users can develop their applications, as well as continue loading data into this database, constructing their particular views of reality by means of data transformations. User interaction with GIS can be characterized by two main activities: speci®cationÐ this corresponds to modeling and designing of geographic databases and applications; and
32
DE OLIVEIRA, PIRES AND MEDEIROS
operationÐbrowsing/querying underlying data, in order to derive information and build/ validate models of the world. This section is divided into three parts, to discuss these facets of end-user activities: the ®rst part (2.1) considers the modeling/designing activity, emphasizing data modeling aspects; the second part (2.2) presents our classi®cation of functionalities which must be provided by GIS; and the third part (2.3) shows some solutions that have been developed to help users perform these activities. 2.1.
Modeling geographic phenomena
From a macro point of view, the development of a geographic application can be considered in three major steps: real world modeling, geographic database speci®cation and loading, and operation phase. Real world modeling comprises data and process modeling, and corresponds to selecting, abstracting and generalizing the entities of interest to the user, as well as showing how they vary through time. The output of the modeling activity directs the de®nition of the geographic database, as well as speci®es the function libraries and model parameters that are to be used together with data stored in the database. If users have a software engineering background, this modeling phase follows some methodology and documentation procedures. Finally, the operation phase concerns the actual use of database and libraries, producing different outputs, according to users' requirements. (Notice that in this paper, the term database does not necessarily denote data managed by a database management system. Rather, it refers to data stored in a way that is manageable by the GIS data management system.) Process modeling concerns building a mathematical model that describes operations involving the stored data representations, and includes the simulation of natural phenomena [7]. Mathematical models can be deterministic or statistic. Furthermore, each can be either steady-state or dynamic, depending on whether it contains terms that vary with time [57]. Process modeling begins with selection of phenomena and of an appropriate mathematical model that can describe and simulate such phenomena. It indicates data sources that must be collected and combined in order to present an adequate view of the reality. Once data are collected and stored, the process model is ``invoked'' [45], i.e., a sequence of algorithmic transformations is applied to data. Data and mathematical model are calibrated and re®ned in an iterative process, until a given quality level is reached [6]. Process modeling and numerical simulation vary according to the application domain, extension and scale of the observed phenomena [30]. Process models run on data which have been organized according to a data model. A data model provides the tools and formalisms needed to describe the logical organization of a database, as well as to de®ne the allowed data manipulation operations. Data modeling is the process by which the real world is measured and captured in discrete database records. Data modeling is a well-known ®eld in computer science, combining notions of databases and software engineering. Traditional data modeling techniques are not adequate for dealing with geographic information. Dif®culties arise from the fact that most geographic data must be considered with respect to the location where they are valid, the
AN ENVIRONMENT FOR MODELING AND DESIGN OF GEOGRAPHIC APPLICATIONS
33
time of the observation and their accuracy (e.g., [13], [16], [22], [27], [34], [52]). Creating a geographic data model is a complex task because it involves the representation, in a discrete form, of the continuous and analogue space of reality. Initial work on data modeling for geographic applications dealt primarily with geometrical and spatial data structures and their organization (including the notion of topology). The early data models directly re¯ected the underlying geometries and were closely linked to basic data structures. As stressed by [51], GIS developers are forced to de®ne application entities in terms of GIS internal structures. This causes an impedance mismatch between the end-users' view of the application and the GIS implementation needs. For instance, the same real world concept is named, according to the GIS, as theme, category, layer, information plan, coverage, map. This situation is very clear in a large number of systems available today, where the user refers directly to arc-node structures, in the case of vector-oriented systems, or to grids or quadtrees, for raster-based ones. As a consequence, the modeling procedures mix application needs with constraints imposed by the internal structures. Data modeling plays a critical role in determining the usability and adequacy of a system [29]. This concern has led to a number of conceptual formulations for geographic data models, and to a growing interest on object-oriented concepts, of which SAIF [61] is one of the most comprehensive. The appropriateness of using object-oriented data models for geographic applications has been advocated extensively (e.g., [2], [19], [43], [50], [51], [54], [60], [64], [65]). The main advantages correspond to the fact that such models allow users to do incremental and reusable speci®cation, due to the properties of inheritance, composition and encapsulation. Furthermore, unlike other models, designers can also describe the behavior of real world entities, which allow a better modeling of the dynamics inherent to real world phenomena. In fact, the object-oriented paradigm induces combination of process and data modeling, by allowing encapsulation of data and behavior. From a higher level point of view, end-users see geographic reality according to two basic models: the ®eld view and the object view [24], [26]. (Though normally called ®eld and object models, we choose the term ``view'' to differentiate from object-oriented modeling.) These views are implemented into different GIS data structures, and representation-speci®c operations. These implementations have become confounded with the issue of modeling [16]. In fact, the ®eld versus object views of the world are sometimes described in terms of raster versus vector ``data models'', which, rather than being models from a database point of view, are closer to implementation structures. The ®eld view sees the world as a continuous surface (layer) over which features vary in a continuous distribution (e.g., atmospheric pressure). Each layer corresponds to a different theme (vegetation, soil). Individual entities are created in the modeling process and do not exist independently [28]. Emphasis is on contents of these areas, rather than their boundaries. The object view treats the world as a surface littered with recognizable objects with an identity of their own, which exist independent of any de®nition (e.g., a given river). In this model, two objects can occupy the same place. Database entities correspond to these recognizable objects, which are de®ned a priori.
34
DE OLIVEIRA, PIRES AND MEDEIROS
From a software engineering perspective, these models may be compared to top-down versus bottom-up views of the world. The ®eld view describes phenomena from a high level view, and different regions and entities ``pop-up'' as part of the analysis process (e.g., in a classi®cation procedure). The bottom-up (object view) approach, instead, builds an overall view of a geographic region by ®rst de®ning the objects of interest and then uniting them by means of spatio-temporal relationships. Software engineering design methodologies often advocate mixing both approaches. Translated to the GIS context, this follows the recommendation of [16] that the modeling of geographic reality should combine both ®eld and object views. Field and object views are translated into different representation models. Field data are usually processed in tesseral format (spatial entities described as polygonal units of spaceÐcellsÐin a matrix). One cell contains one thematic value (i.e., there cannot be two types of soil for a given cell). Cells may have different shapes; square cells are called pixels. The raster format (which is often used as the generic name for tesseral data) is just one special type of tesselation with rectangular grid format, organized in line scan order. In this case, coordinates are not stored, but rather derived by the position of the cells in the scan order. Object data are processed as points, lines and polygons (the vector format model), using lists of coordinate tuples. Boundaries of regions are stored precisely, and several attributes can be associated to a single element. Networks are a special case of vector data, where elements are sets of links and nodes. They are used for facility management and network analysis (e.g., in transportation or hydrology). This type of format is usually more adequate for representing man-made artifacts and in AM/FM, whereas the use of tesseral data is widespread in environmental applications.
2.2.
GIS functionalities
There is no consensus in the scienti®c community about the complete functionality of GIS [5], [36], [52]. This derives directly from the absence of a unique de®nition for the term GIS. There exists, however, a kernel of functions that is present in almost all systems. All these functions are essential for proper handling of georeferenced data. The functionality expected from a GIS varies according to the application domain and user pro®le [37]. For instance, in the case of digital cartography, GIS are expected to provide services in map processing and presentation; AM/FM applications require sophisticated database functions, whereas environmental planning demands image processing capabilities. Rhind, Openshaw, and Green [53] identify some of the essential functions in GIS as: input and encoding; manipulation; retrieval; analysis; presentation; and data management. Maguire and Dangermond [36] introduce a similar taxonomy, based on the stages of georeferenced data transformations: capture; transfer, validate and edit; store and structure; generalize and transform; query and analyze; and present. We consider four main classes of functions which must be provided by GIS:
AN ENVIRONMENT FOR MODELING AND DESIGN OF GEOGRAPHIC APPLICATIONS
35
1. Input Functions: These are functions which must be performed before the effective use of data in GIS. In general, these functions consume large amounts of processing and I/O time, and they may load huge volumes of data into the system. In this group of functions we can cite: (a) Capture: procedures and devices for geographic data collection, such as remote sensing, GPS, scanners and table digitizers. (b) Transfer: moving previously captured data, stored in device speci®c formats, into GIS databases. (c) Error handling: validating the data and correcting unusual values. The application of these functions to georeferenced data involves hard problems, which include: integration of different sources of data; conversion of analogue data to digital format; classi®cation of remote sensing data; and data quality assurance [22]. 2. Data Modeling Functions: Functions in this group provide support to the data modeling activities described in the previous section. In other words, these functions should allow users to organize the data prepared and integrated by the input functions according to the needs of particular applications, and following the users' views of the geographic reality. The idea of data modeling (in the sense used in this paper) is usually absent in GIS. For this reason, the notion of data modeling functions as a class per se is not considered in taxonomies. We separate them from other groups of functions because they are fundamental to proper geographic database speci®cation, and subsequent data reuse. Current GIS provide only very low-level data modeling functions, which are far from the users' view of the world. Rather than high-level data modeling, users are required to store and structure data according to the data structures implemented by the GIS. This organization of data is usually biased with respect to performance aspects, and determines the internal representation and the range of possible analysis functions. 3. Analysis Functions: Functions allowing query and manipulation of data stored in the GIS are generally referred to as analysis functions. Together with input functions, these have received considerable attention from the GIS community, since they distinguish GIS from other types of information systems. Process modeling is in fact a high level algorithmic description that combines analysis and data transformation functions. Although analysis functions have been exhaustively studied, their use in current GIS is hampered by the absence of suitable process modeling support. A large set of analysis functions may be available, but the users often do not know or cannot take advantage of this, because GIS data structures are not ¯exible enough. What is lacking in this scenario is a semantic data model that allows the users to de®ne conceptual entities (through attributes and generic behavior) rather than data structures (and speci®c operations). There are many taxonomies of analysis functions (e.g., [3], [5], [36], [53]). A recent study [66] de®nes six basic categories that are domain and data-model independent,
36
DE OLIVEIRA, PIRES AND MEDEIROS
and which are needed by any GIS task: search/reclassi®cation; location analysis; terrain analysis; distribution/neighborhood; statistics; and measurements. It is interesting to note that even though claimed to be domain independent, this classi®cation does not consider several network-based operations that are used in the context of utility (AM/FM) applications. 4. Presentation Functions: Functions in this group are used to prepare the results of analysis for output. Although presentation is often confounded with analysis, it is clear that presentation and analysis are separate concepts, since the result of a given set of analysis operations may be presented in several ways. Presentation functions are actually an interface/visualization issue (though of course connected with the application domainÐe.g., cartographic production). Typical examples of presentation functions are graphical display generation, report writing, tabular data summarization, and use of cartographic symbology. 2.3.
Some approaches to breaking user-GIS mismatch
Current GIS do not support either data or process modeling. From a computer science point of view, the solution to these problems has been long pointed out by the database and software engineering communities: the mismatch between application design and development may be diminished by providing adequate data modeling and design methodologies and tools, supporting the users' view of the world. The same modeling facilities help integration of distinct data sources, contributing to the control of data quality standards. There has been intensive work in developing tools and environments to bring the GIS closer to the user, thereby ``empowering people to utilize GIS as reliable sources'' [20]. Bringing the user closer to the system facilitates data collection and decision taking [12]. Solutions can be roughly classi®ed into: tools to solve speci®c problems; new generation GIS; and different kinds of environments. New generation GIS are being developed under prototypical form, being usually built on top of object-oriented or extensible database systems. These databases are used in order to solve some of the data modeling and manipulation problems (e.g., [18], [31], [32], [39], [43], [54], [58], [62]). Most of these prototypes rely on the facilities offered by the underlying database to help data speci®cation. Though such systems show the concern in helping the user in data modeling, there is no mention of process modeling. These prototypes are typical of the ``database-centered'' view of GIS. Examples of more general modeling environments are [4], [56]. We chose those among many other proposals because they stress the importance of process modeling. Nevertheless, they do not consider integrating this with the data modeling activity. The work of [56] is geared towards process modeling. The paper describes a computational environment for characterizing scienti®c modeling methods, in order to support representation, manipulation and evaluation of scienti®c concepts. Process modeling is constructed in terms of R-structures (representation structures), which are
AN ENVIRONMENT FOR MODELING AND DESIGN OF GEOGRAPHIC APPLICATIONS
37
abstract representations of a concept (similar to the concept of abstract data types). Rstructures can be combined by means of constructors (e.g., aggregation) and constraints. The dynamics of real world processes are modeled through sequences of R-structure instance transformations. Since these transformations lead data from one state (R-structure instance) to another, geographic phenomena can be modeled using directed graphs of Rstructures, where nodes are the instances and the edges represent transformations. These ideas were applied to building a computational modeling environmentÐ AmazoniaÐwhich supports large scale hydrologic research. R-structure concepts are implemented with the help of an object oriented database system. Geographic data may be stored anywhere, from geographic databases to ®les across a network. This type of environment can certainly help GIS users model processes, in a way akin to algorithm construction, and is thus an important means of bridging the gap between end-users and GIS computational tools. It does not, however, help in designing or using data sets, or in browsing through existing data, which is one of the ways GIS users operate. A different approach is proposed by Hermes [4], which is an environment that helps to manage geographic data for planning decisions in urban areas. Hermes is based on an urban planning methodology, which guides the user in selecting appropriate data and developing computation of scenarios, in order to take planning decisions. The methodology considers three planning steps: strategical planning, which de®nes guidelines for classifying urban areas with respect to desired constraints; tactical planning, which ®nds out potentially suitable areas according to the project's goal; and operational planning. Hermes allows users to store different algorithms for combining data, each of which corresponds to a model. It is built on top of a relational geographical database system. Data is modeled according to an extension of the relational model, geared towards georeferenced data. Hermes does not consider the ®eld view of modeling, given the fact that it is geared towards urban planning applications. The design of GIS applications with support on a temporal model is also discussed by [59], who implemented a CASE tool to help GIS users do rapid prototyping of applications, which are next implemented in a commercial GIS. The model extends object-oriented notions with time concepts. There is, however, no concern with user speci®cation patterns (user pro®les). 3.
Integrated environment for modeling and analysis
Roughly, one can say that GIS users work in three modes: *
*
Modeling of processes and data. User actions in this mode comprise real world modeling, application design and implementation, and output speci®cation. This is accompanied by a planning activity where, given a problem, the user decides how to best combine available data in order to ®nd the answer. Production of output. User actions in this case correspond to posing a speci®c query, whose results may feed
38
*
DE OLIVEIRA, PIRES AND MEDEIROS
other queries. The querying sequence may be interspersed by browsing/navigation through the geographic database schema and data, in order to allow the user to ``zero into'' the desired result. The output consists primarily of a series of maps, or charts. Less frequently, the output corresponds to textual data (e.g., tables). Performing spatio-temporal data analysis. Users combine different analysis functions, or even code a complete application, in order to create new scenarios and derive information. UAPEÂ
*
*
supports the two ®rst types of interaction:
In the modeling mode, users can design an application and document planning strategies. In such a mode, users are guided through the development of applications by a speci®c methodology, supported by an object-oriented geographic data model, called GMOD. This data model is mapped to the underlying database, therefore diminishing the gap between the users' view of the world and the GIS data structures. In the output production mode, users can use the graphical interface to navigate through geographic data and schema.
The third type of interactionÐcomputation of complex scenarios and application invokingÐis not supported by UAPEÂ, but rather by the application and the database which were modeled and designed using UAPEÂ. UAPEÂ users can alternate between the ®rst two modes (e.g., schema browsing in the middle of application design, in order to better choose the appropriate data sets). UAPEÂ is de®ned in terms of an open architecture, and integrates an object-oriented geographic data model with an application design methodology, which supports data and process modeling. This section brie¯y describes the data model, the methodology guidelines and ®nally discusses the environment architecture. 3.1.
The GMOD data model
Model Overview. GMOD is object-oriented. Since there is no standard de®nition for object-oriented models, we follow [8]'s class-based framework. We assume that an object is an instance of a class and is characterized by its state, or set of attribute values, and behavior, or set of operations or methods that can be applied to the object. An object o can be constructed out of other objects o1 , . . . , on , in which case o is called complex and o1 , . . . , on are called the components of o. If an object is not complex, then it is called simple. Classes can be structured into inheritance hierarchies; the descendants of a class C in the hierarchy are called the subclasses of C. An object-oriented database is speci®ed by means of its schema (class and method speci®cation, indicating inheritance and composition links) and instances (the objects). GMOD is the basis of the communication between user and UAPEÂ. It is an extension of [11]'s object-oriented geographic model, allowing de®nition of georeferenced phenomena according to both ®eld and object views. Our model extends [11]'s proposal
AN ENVIRONMENT FOR MODELING AND DESIGN OF GEOGRAPHIC APPLICATIONS
39
by including two important considerations: modeling the temporal dimension and describing relationships among entities. GMOD distinguishes between conceptual construction and geometrical and topological representation. It is divided in four levels: the real world level, which comprises the elements of the geographic reality that will be modeled; the conceptual level, which allows modeling geographic elements at a high level of abstraction; the representation level, which introduces the details pertaining to the geometrical and topological representation of the spatial properties of the geographic elements de®ned at the conceptual level; and the implementation level, which corresponds to the internal structures and operation implementations. In this paper, we will concentrate on the conceptual and representation levels. The real world, the conceptual and the implementation levels have exact matches in traditional database design, whereas the representation level does not. Indeed, as stressed in [11], in traditional database design, the question of deciding how to represent the properties of an element de®ned at the conceptual level is too simple to deserve separate discussion. Moreover, properties have just one representation. For example, the database designer will decide the representation of employee names and salaries when de®ning employees. Furthermore, it hardly makes sense to store employee attributes in more than one representation, since the conversion routine is trivial most of the time. By contrast, the representation of the spatial properties of geographic elements involves questions that deserve careful attentionÐe.g., scale, precision, cartographic projection. GMOD is based on three main concepts: classes (in the sense of an object-oriented model); relationships, which allow connecting these classes in several ways; and constraints, which are imposed on classes, relationships, and their instances. Classes and relationships may be temporal or atemporal, according to whether or not their instances are allowed to vary with time. To help readability, class names are written in small caps (e.g., GEOREGION) whereas speci®c instances are written in italics (e.g., geo-region). Figure 1 shows the class structure at the conceptual level. The main classes are inside the dashed box. The classes outside this box are needed to support the methodology and are described in Section 3.2. The user can basically establish any type of relationship among any kind of classes. We have not included this in the ®gure due to clarity. We now explain the class structure, relationships and the temporal dimension. Class structure. The class structure of GMOD is basically the one proposed in [11]. At the conceptual level, there are two basic classes: GEO-CLASSES, whose instances are objects with some spatial component, and CONVENTIONAL classes, that describe real world entities that are not necessarily georeferenced. For instance, an agricultural planning application may need to handle information about sugar cane species, or fertilizer properties. This information is normally stored in conventional classes. Eventually, instances of these classes may be associated with instances of geo-classes. (e.g., a sugar cane plantation will be described by an instance of a complex geo-class which has a conventional component describing crop properties). This distinction between conventional and geo-classes allows different applications to share non-spatial data, and helps application design and data reuse.
40
DE OLIVEIRA, PIRES AND MEDEIROS
Figure 1. The GMOD data model.
Both conventional and geo-classes can be specialized, using the object-oriented notion of inheritance. Their objects may maintain several types of relationships among themselves, which are described further on. The model is general in the sense that geoclasses can model phenomena in an arbitrary number of dimensions (one of which is time). However, due to implementation problems at the representation level, we have restricted ourselves to a 2D spatial modeling. The spatial dimension of geo-class instances is modeled in Geo-regions, which describe (2D) regions of the Earth's surface, according to some projection and scale. These classes are often ignored in geographic data models, and sometimes are named ``geometric object'' classes. Geo-regions, however, are at a different level: they abstract spatial characteristics, whereas ``geometric objects'' are usually connected with vectorial representation concepts. Geo-classes can be further specialized into classes of geo-objects, which describe entities of the world according to the object model and geo-®elds, which correspond to a ®eld view description. Geo-objects and geo-®elds have always a location attribute to describe the geo-region to which they apply, and several conventional attributes, describing their nonspatial characteristics. More precisely, a geo-®eld is an instance of the class GEOFIELD and, besides conventional components, always has a domain attribute (location), whose value is a georegion A, a range attribute, whose value is a set V, and a geo-®eld mapping attribute, whose value is a mapping from A into V, modeling a ®eld view in A whose values range over V. GEOFIELD is further specialized into THEMATIC, NUMERIC and REMOTESENSING classes, re¯ecting some of the most common data sources found in environmental applications. Geo-objects are instances of the class GEOOBJECT, and always have a location attribute, which is an object of the GEOREGION class. Geo-objects can be elementary, weak or complex. Complex geo-objects are composed of other geo-objects whereas an elementary
AN ENVIRONMENT FOR MODELING AND DESIGN OF GEOGRAPHIC APPLICATIONS
41
geo-object has no components which are geo-objects themselves. A weak geo-object is a geo-object that exists only as long as it is part of a (unique) complex geo-object. At the representation level, the model introduces other classes which allow users to describe the representation of geo-class instances. This separation allows one given conceptual entity (e.g., Plantation) to have distinct representations (e.g., with different scales, projections or classi®cation parameters). This also helps data integration across applications. Representation classes belong to two main hierarchies, rooted at classes REPOBJECT and REPFIELD. The ®rst corresponds to a hierarchy that allows representations using points, lines and polygons (vectorial representation). REPField subclasses include GRID, TRINET, CONTOUR, PLANARSUBDIVISION, POINTSAMPLE, corresponding to the most frequent ®eld representation models [29]. For instance, TRINET represents a geo-region partitioned into irregular triangles, where the value V is speci®ed at each vertex and may be associated with different variation functions over the triangle. POINTSAMPLE allows de®nition of ®elds both as a result of irregular and regular (e.g., DEM) samplings over the geo-region. The design of a database at the representation level can be viewed as a re®nement of that at the conceptual level. The user de®nes geo-®elds, geo-objects and conventional objects and decides, for each class, whether their locations should be represented separately, and, for each class of geo-®elds, how their mappings should be represented. At the representation level, geo-®elds lose the domain and mapping attributes, which are represented by one or more separate objects of REPFIELD classes. Thus, an instance of a geo-class contains a spatial component, described by an object of a geo-region class. This object may in turn be connected to distinct instances of representation classes. The temporal dimension in GMOD. Time is a very important issue in any geographic application. Several solutions have been proposed to take care of time variations in this context (e.g., [23], [35], [41], [49]). At the same time, there has been intensive work on database temporal models, with recent results described in [15]. One of the most important issues is that time modeling is very much user-dependent, and therefore a model should not constrain users to adopt a speci®c frame for time evolution. The time dimension can be represented in many ways and be described by many kinds of functions. Two types of time axes are usually considered: transaction time (describing when data are actually stored) and valid time (describing how values varied in the real world). In order to allow ¯exibility in modeling of time, our model introduces another class hierarchy, rooted at class TIME. Subclasses of this class model distinct types of time characteristics (e.g., discrete versus continuous; stepwise versus linear variation, and so on). This hierarchy is not visible to the end-user, being de®ned only as a building block for spatio-temporal classes. The class hierarchies described in Section 1 can be de®ned by the end-user as temporal (which means the user is concerned with phenomena evolution) or atemporal. In the ®rst case, class de®nition is enhanced by including, for each temporal class, a component which points to an object of type TIME. This is similar to [59]'s proposal but, unlike it, allows temporal variation not only of spatial but also of conventional data.
42
DE OLIVEIRA, PIRES AND MEDEIROS
The possibility of building complex objects implies that not only an object of a temporal geo-class is associated with a TIME element, but that its components may also vary in time. Consider again the sugar cane plantation, and suppose it is divided in distinct plots. If plots are allowed to vary temporally, several types of temporal evolution may occur: * *
*
the content of a plot may change in time (e.g., by having distinct sugar cane species); the boundaries of a plot may change in time (i.e., the corresponding geo-region may vary); plots may disappear or be created.
All of these modi®cations may occur without changing the characteristics of the Plantation itself. However, the plantation will be obviously modi®ed because some of its component objects will have changed. Moreover, the complex object Plantation may also vary in time regardless of changes in its components, for instance, by changing owner (conventional attribute). The description of the complete temporal model and implications on temporal consistency in a class hierarchy or composition relationships are beyond the scope of this paper. The reader is referred to [9], [38] for details. Relationships in GMOD. One problem in most geographic data models is that they ignore the possibility of modeling relationships among real world phenomena. Many proposals, in fact, are based in the entity-relationship paradigm [22], but this does not offer enough semantic power to describe geographic entities. Geographic modeling considers two types of relationship: explicit and implicit. The ®rst must be speci®ed by the user (e.g., in modeling temporal or spatial processes). Implicit relationships are those that can be derived from existing data (e.g., spatial relations [14], [40]). GMOD is concerned with explicit relationships: we de®ne ®ve main relationship types that can be expressed by the userÐgeneralization/specialization; aggregation; association; versioning and causal. The examples that follow concentrate on the database classes' structure. In order to properly model relationships, they must be subject to constraints and methods. Generalization/specialization allows users to generalize/derive new entities from those already speci®ed, and are inherent to object-oriented modeling through the ``inheritance'' mechanism (is-a relationship). For instance, in a socio-economic application, a BUILDING may be specialized into fRESIDENTIAL, COMMERCIAL, SERVICEg. The aggregation relationship allows building complex objects and is supported by the ``composition'' mechanism of an object-oriented model ( part-of relationship). For instance, an archipel may be described by aggregating islands, and a road network can be speci®ed as composed of (sub)-roads and limiting topography features. Associations correspond to general relationships in ER models, allowing users to establish different types of connections among entities. Some association relationships may have attributes of their own. A conceptual level object in GMOD is linked to its many
AN ENVIRONMENT FOR MODELING AND DESIGN OF GEOGRAPHIC APPLICATIONS
43
representations by means of association relationships. Associations can connect classes both in time and in space. For instance, a road is associated with a bridge. All further types of relationships not directly supported by object-oriented modeling can be seen as some kind of association, but we distinguish two semantic variationsÐversioning and causalÐgiven their importance from a user's point of view. Versioning relationships allow users to determine connections among versions of a single concept. One important type of versioning relationship is the one that is imposed by temporal evolutionÐe.g., two objects of different classes may be connected because they represent the evolution of a given phenomenon in time. Versioning also allows linking distinct representations of a given entity [42]. Furthermore, versioning allows connecting distinct scenarios which have been derived, for the same region, for a given planning project. Each scenario may be the result of a different process model, and versioning relationships help subsequent navigation through these scenarios. We introduced versioning relationships in GMOD because of their importance in geographic process modeling. Even more important in this sense are causal relationships, which establish links between cause±effect among modeled phenomena. For instance, the type of soil of a given area directly affects vegetation characteristics. Causal relationships are common in environmental studies where the in¯uence of human activities is studied according to their impact on nature. Causal relationships may be global or local, and their speci®cation is fundamental to an appropriate process model. In some cases, they correspond to (dynamic) integrity constraints. Versioning and causal relationships are inherently temporal. For instance, the maturation curve of a given crop may vary in time according to soil and fertilizer properties. As well, a given phenomena may only occur after a speci®c human action (e.g., water pollution may be caused by industrial waste). Thus, GMOD allows users to specify the temporality of causal and versioning relationships, which in turn means their instances will be connected with TIME object instances. 3.2.
Environmental application design methodology
The variety of methodologies used in environmental planning applications grows continuously. For instance, [44], inventories over 100 methodologies for environmental planning applications. Most of them are heuristics driven and are bound by the application domain, data sources available and the users' approach to the problem, in¯uencing the process model used. In several cases, no clear cut methodology is discerned (in a software engineering sense); sometimes, no model exists; furthermore, in many cases users do not follow any given set of steps, but go through iterative building of scenarios and outcome analysis. The methodology supported by UAPEÂ is the result of 2 years' work with different types of applications in the domain of environmental planning and control. In order to arrive at this methodology, we received considerable help from researchers of the Geosciences, Civil Engineering and Agriculture Engineering Institutes of UNICAMP, who have extensive hands-on experience in distinct types of environmental planning applications using GIS.
44
DE OLIVEIRA, PIRES AND MEDEIROS
To support this methodology, GMOD provides additional database classes, which document users' projects and allow connecting the same data to different applications. These classesÐDOCUMENT, PROJECT, PROJECTAREA and INFOLAYERÐwill be described in what follows. Figure 1 shows these other classes. The methodology has been conceived on the following premises: * *
*
*
it must support both process modeling and data modeling; it must take into consideration the fact that real world modeling and simulation is an iterative activity. Thus, it has to support an incremental development life cycle, where users alternate between design/test/validate activities; it must support users' work patterns and at the same time induce them to follow software engineering procedures; it must help users document their activities, and take into consideration the fact that many of them cannot be automated.
Rather than describing the methodology at length, we discuss the main guidelines which orient it, and which have been implemented into UAPEÂ. Brie¯y, the methodology supports user activities as if they consisted of a sequence of data and information transformations ( process oriented view of GIS, see Section 2). Data sources can be of any type (®les, user actions, etc). Thus, the development of an application (a Project) is a process that is triggered by the need for solving an environmental problem, and whose ®nal output is a combination of electronic data and policies and strategies that direct the implementation of the solution. The methodology guides end-users through a sequence of ®ve basic steps in environmental planning: de®nition of objectives; modeling; information integration, prognosis and identi®cation of alternatives; decision taking and policy and strategy de®nition. Each step comprises several tasks, which can be accomplished by different means (automatic or manual). The de®nition of objectives comprises the following tasks: description of the problem; delimiting the geographic area of interest; de®ning pertinent categories, factors and parameters; and selecting working scales. The description of the problem and ®nal goal are recorded in an object of the PROJECT class, and the area of interest is de®ned in a PROJECTAREA object, which describes the geo-region to be considered by the project. The de®nition of factors and parameters corresponds to determination of the geographic themes (factors) that will be considered, and, for each theme, the relevant attributes ( parameters). It is in this stage that the user speci®es some of the database classes. The modeling step corresponds to process modeling. In a ®rst stage, the user identi®es the analysis functions to be applied, and in a second stage de®nes how to execute them according to a set of algorithms and models. The result of this step is the speci®cation of classes in the INFOLAYER hierarchy. Infolayer objects will correspond to different views (scenarios) of the ProjectArea object. These views are made persistent in order to allow their subsequent retrieval and manipulation. Thus, one instance of PROJECTAREA is associated (through an association relationship) with several InfoLayer objects. For instance, if the ProjectArea object describes a given farmland region, and the user is
AN ENVIRONMENT FOR MODELING AND DESIGN OF GEOGRAPHIC APPLICATIONS
45
concerned with crop rotation planning, one INFOLAYER subclass may be called VEGCOVER and contain several descriptions of vegetation covers according to different classi®cation criteria. Another subclass may be SOILMAP, and so on. Information layers are not the same as thematic layers, since they allow integration of more than one theme into a single object. They can be combined among themselves by means of analysis functions, in order to build more complex objects (e.g., overlay of VEGCOVER and SOILMAP instances). Information integration, prognosis and identi®cation of alternatives requires executing the model speci®ed in the modeling step, generating instances of InfoLayers and other derived data. As well, in this step different scenarios may be integrated. At this point, users may provide alternative strategies to obtain the desired goal. Decision taking occurs when the expert chooses a planning strategy to achieve the goal, by selecting a scenario or information layer from the set of alternatives produced by the previous step. Decisions may be helped by several tools and techniques, such as multicriterion analysis. It is essentially a human-dependent task and cannot be completely automated. Finally, policy determination corresponds to the speci®cation of norms and procedures to be taken in order to implement the solution chosen in the previous step. These procedures can be of a technical, juridical or administrative nature. Standard norms may be extracted from a separate (e.g., legislation) database, which is made available to the user. Though the methodology is supported by UAPEÂ, there are several tasks that cannot be performed automatically (e.g., census taking during data collection, or policy de®nition). In these cases, UAPEÂ indicates to the user which tasks must be executed. The user can record the decisions and actions taken, which are stored in objects of class DOCUMENT. At the present implementation of UAPEÂ, the user is prompted to provide documentation at each step (e.g., describing reasons for deciding on a given scenario). The de®nition of objectives, modeling and information integration steps are directly supported by the GMOD data model, which allow speci®cation of classes and relationships to be used in the application. We ®nally remark that another important user activityÐmonitoringÐis not supported by the methodology. Monitoring corresponds to long-term re-evaluation of the solution and of its implementation, checking this against the application goals (e.g., observing government enactment of a recommendation). This may result in database updates and model revision, which can of course be performed in UAPEÂ, but not in terms of longrunning transactions.
3.3.
The environment architecture
In this section we describe the basic architecture of our environment. This architecture is an improvement of the architecture introduced in [46]. We ®rst describe the architecture from the user's point of view, and next provide functional details from an implementation perspective.
46
DE OLIVEIRA, PIRES AND MEDEIROS
Figure 2. The conceptual organization.
From the user's perspective, UAPEÂ can be seen as composed of a set of integrated modules (see ®gure 2). The concepts behind these modules, rather than the functional organization, are directly used and perceived by UAPEÂ users, which is why we call this view a ``conceptual'' architecture. 1. User Interface (UI) Module Users of UAPEÂ are able to explore the underlying database through a direct manipulation user interface. This interface provides a uniform framework for working with the other concepts of UAPEÂ. The main goal of the user interface is to offer facilities for modeling and browsing activities within UAPEÂ. These facilities range from graphic editors for schema design to browsing mechanisms for data retrieval and manipulation, with help from the Methodology Advisor (see below). 2. Modeling & Design (MD) Module As remarked in Section 3.1, GMOD allows users to de®ne the database from a highlevel point of view, describing not only the structures, but also the behavior of the entities. Therefore, it is possible to specify both data and process models with GMOD. The MD module is responsible for supporting the user in the description of these models. To attain its goal, the MD module embeds the knowledge about GMOD. Nevertheless, it depends on the UI and on the Methodology Advisor modules to, respectively, interact with the user, and help the user to de®ne database schema and select analysis functions. 3. Methodology Advisor (MA) Module This module is designed to assist the user of UAPEÂ on the modeling activities. It embeds knowledge about the methodology described in Section 3.2 and guides the user on following its steps. The MA module offers the users two basic types of assistance. First, it directs process and data modeling following the methodology guidelines. Second, it validates users' actions, prompting with possible alternatives in case of invalid actions. It also checks the existence of related concepts in the databases, in order to promote schema and data reuse.
AN ENVIRONMENT FOR MODELING AND DESIGN OF GEOGRAPHIC APPLICATIONS
47
4. Retrieval & Manipulation (RM) Module The RM module is used to retrieve and manipulate data within UAPEÂ. The information handled by RM must have been de®ned through the MD module, and involves both data and meta-data about geographic entities. Facilities of the RM module are used by the UI module, to present data to the user; by the MD module to reuse data and meta-data during modeling activities; and by the MA module to assist the user in these activities. 5. Encapsulated Geo-Database (ED) Module The underlying GIS databases are abstracted to the users of UAPEÂ through the ED module. Therefore, users work with the idea that all the data are stored in UAPEÂ and organized according to the concepts de®ned in the MD module. The data in the ED module are manipulated through functions from the RM module. The ED module is also responsible for keeping the meta-data associated with the geo-databases. While the conceptual organization describes the components of UAPEÂ from the users' perspective, the functional architecture describes its software-engineering speci®cation (see ®gure 3). The layers de®ned in the functional architecture provide support to the basic conceptual modules of the environment. This organization is based on a multi-layer description, which is connected to different GIS by means of external drivers, whose goal is to provide a translation between UAPEÂ structures and distint GIS. The ®nal goal is to allow integration of UAPEÂ and GIS. There are two basic approaches to integrating modules to existing software, and in particular to a GIS: strong integration and weak integration [63]. In the former, the modules become part of the geographic system, sharing its data model and taking advantage of the knowledge about the internal data structures. In the latter approach, weak integration, they are considered to be external modules, being therefore adaptable to more than one system. Weak integration of a module to a GIS demands the de®nition of communication and data conversion protocols between the module and the GIS. In strong integration this is not necessary, but there is great dif®culty in using data from different sources, and it is not possible to adapt the same user interface to different GIS.
Figure 3. The functional architecture.
48
DE OLIVEIRA, PIRES AND MEDEIROS
Therefore, our architecture is based on the weak integration approach, following a world-wide trend towards the development of open GIS (e.g., [1], [26], [31], [50]). Moreover, this kind of integration provides independence and improves specialization of functionality of each component. The architecture has three main layers: *
*
*
The User Dialog layer manages the user interaction with the interface system. It is responsible for two main tasks: the creation and management of presentations, and the translation of user's requests into operations of GMOD (and vice-versa). These tasks are performed, respectively, by the Presentation manager and by the Interaction manager. The de®nition and management of presentations are handled, in the screen, through two areas: a control area for query de®nition and a display area for result visualization. This task involves graphical display operations (display area) and dynamic construction of widgets (control area). The second main task is the translation of direct manipulation actions of the user into high-level conceptual operations on georeferenced data, managed by the Data Model layer. The Data Model layer is responsible for providing to the user the GMOD view of the geographic database (see Section 3.1). Its structures are stored in an internal database, which at present is built on O2. It supports browsing on concepts rather than on representations, allowing the user to manipulate multiple representations of the same data by invoking methods on the corresponding class objects. Another important task of this module is to convert conceptual operations into representation dependent operations, which are sent to the external driver. The Conceptual manager is responsible for the object-oriented schema that describes geographic entities. The Representation manager records the different representations associated with each conceptual entity. The Graphic manager handles graphical objects in the presentations that correspond to conceptual objects in the Conceptual manager. The Methodology manager is responsible for guiding the user through application and database design, and has also access to the environment database. The External Driver layer is responsible for communication between UAPEÂ and different GIS. It converts data from the format used in the GIS to GMOD and viceversa. This is achieved by means of a communication protocol that is based on primitive operations that allow retrieving from the GIS database schemata, class descriptions and data values. At present, the environment runs on top of O2. We have begun to design an external driver for the SPRING GIS. The approach used in the de®nition of this module is the same we used to integrate a user interface to different object-oriented database systems [48]. The interface sends queries using the primitive operations (Get-Schema, Get-Class, and Get-Value), and the external driver implements these operations according to the syntax of the underlying geographic DBMS.
AN ENVIRONMENT FOR MODELING AND DESIGN OF GEOGRAPHIC APPLICATIONS
49
The Data Model Layer is thus a set of modules that lets the user specify object-oriented classes in GMOD; then, these classes are translated by the External Driver Layer into GIS structures. This is not the same as logical/physical database modeling, it is rather a translation of one representation (object oriented) to another (GIS-based). The architecture supports distinct conceptual views of the geographic space. Each conceptual view corresponds to an object-oriented database schema according to the de®nition of the conceptual level of GMOD. To provide facilities for navigating in both instance and schema level, the user dialog module uses the GOODIES open system [47], a generic OODBMS browser. Given the complexity of the architecture, we have initially implemented a prototype where, instead of coupling UAPEÂ to a GIS, all data are stored in the O2 database management system. User interactions through the interface layer are translated into O2 database schema de®nitions and OQL queries. Temporal data handling will use an adaptation of the prototype described in [10] which allows temporal data manipulation in the O2 system. Some spatial relationships are already available, but general causal relationships are not maintained. 4.
Using the environment: an example
This section brie¯y describes an example of the use of UAPEÂ as a tool for designing one speci®c geographic application and database. First, we discuss users' standard procedures. Next, we show how these procedures are systematically supported by UAPEÂ. The goal of the application was to develop strategies to prevent contagious and parasitary diseases in the county of Paulinia, state of SaÄo Paulo. This application was developed by researchers of the Civil Engineering Faculty in UNICAMP, in cooperation with the health authorities of the county. In order to develop these prevention strategies, this research derived spatial correlations between the biophysical and socioeconomic structures of the region and the population health, in terms of incidence of infectious (contagious) and parasitary diseases. The study mapped the county according to distinct health hazard parameters, indicating to the authorities which regions deserved more attention from a health planning point of view. The ®nal output consisted of a series of maps classifying county zones according to distinct health parameters, and a set of directives pointing out prevention measures. 4.1.
Standard procedures
In this section we describe how users typically design and implement our example application. Application planning and database design are conducted on paper. Usually, experts have long term experience in such activities, from which they derive operating procedures. We remark that it was exactly the observation of these operating procedures that enabled us to de®ne our methodology. The main problem, from the users' point of view, is to ``set up a database'', on which the
50
DE OLIVEIRA, PIRES AND MEDEIROS
process model will be implemented. This database is de®ned as a result of six complex activities: *
*
*
*
*
*
Identi®cation of the relevant data sources. These are de®ned based on a project's objectives and region of study. Data collection/organization and speci®cation of a process model. The users conducted interviews and analyzed different reports on the area in order to de®ne this. Modeling. In this phase, the distinct data sources are collected and homogenenized in terms of scale and data type. At this point, the user has to consider the GIS data organization, storage characteristics and available analysis facilities. GIS characteristics directly in¯uence data collection and integration, and from here onwards the user is forced to abandon the original conceptual view of the application. The process model corresponded to applying sequences of overlays to different maps, created by classi®cation operations. Data input and application development. Here, data are ®nally stored in the GIS database, which demands converting collected data into the GIS format. This often introduces new errors into data. Once the data are stored, the application is developed. Application development corresponds to executing the process model (i.e., maps are created via classi®cation and overlay operations). Decision taking and policy speci®cation. Users study the distinct results produced by the application, and choose one or more of these as indicating solution directions. These directions are then translated into suggestions/directives to be taken (e.g., in the example, measures to be taken by county of®cials to improve health care). Monitoring. From now on, experts will follow the implementation of the directives, and compare the results against the outcome predicted by the application.
The documentation of these stages and of design decisions taken is not systematically organized, and often dispersed. As well, because of lack in documentation, and of absence of an integrated view of the geographic database, users often waste valuable time collecting data which are already available. (This is, in fact, typical in many governmental agencies which, due to the lack of a common standard and data model, collect virtually the same data several times, in, e.g., AM/FM applications.) 4.2.
Application development under the environment
We now describe how to develop the same application with the help of UAPEÂ. First, users can only de®ne classes either by means of specialization of CONVENTIONAL or GEOCLASSES, or by building more complex classes through the use of association and
AN ENVIRONMENT FOR MODELING AND DESIGN OF GEOGRAPHIC APPLICATIONS
51
aggregation relationships. Also, at every step users can choose to document their actions by creating Document objects which may be linked to speci®c data instances by means of association relationships. De®nition of objectives. The system creates instances of Project, ProjectArea objects and the user provides the data for these instances, describing values such as project name, goals and other relevant data (e.g., factors that triggered the need for the project). The ProjectArea object is initialized with the description of the geo-region of interest. In the example, since the Paulinia county was not already de®ned in the database, the user had to enter the coordinate data. De®nition of the database schema. In this stage, the user alternates between schema/ data browsing and schema speci®cation (RM and MD modules of the conceptual architecture supported by the Data Model layer of the functional architecture). Browsing allows the user to check if the desired classes or data are available. Schema de®nition allows the user to de®ne GEOFIELD, GEOOBJECT and CONVENTIONAL classes. In the case of geo-®elds, the user is allowed to enter the domain of the mapping, when classi®cation analysis operations are envisaged. For instance, a VEGETATIONCOVER geo-®eld class will specify friparian woodland, ¯ooded plains, . . .g. Examples of geo-®eld classes for this application included SOIL, HYDRORESOURCES, LANDUSE. Geo-object classes are also de®ned by specifying their components; examples were HOSPITAL, SCHOOL, INDUSTRY. Each Hospital was represented by a point. Conventional classes are also de®ned in this stage. In the example, the most important one is the (temporal) hierarchy of Disease, covering both infectious and parasitary modalities, which were stored with time series information about their incidence in the county over the last 5 years. Several relationships are also de®ned here. For instance, class HEALTHINFRASTRUCTURE was de®ned as a class of complex geo-objects, resulting from an aggregation relationship linking HOSPITAL and HEALTHCARECENTER. As well, the user speci®ed several causal relationships. One example is the relationship between geo-object class INDUSTRY and geo-®eld class LANDUSE, indicating that the installation of an industrial site will change the land use parameters in the site's vicinity. The description of this relationship was stored in a Document object. Figure 4 shows part of the database schema for the application. The DISEASE and PROJECT classes are conventional, whereas all others are geo-classes. During schema speci®cation, UAPEÂ helps the user detect existing classes, which do not need to be speci®ed again, by means of interaction of MD and MA modules (the Data Model layer of the functional architecture). As well, the user can check if there are available instances of these classes whose spatial extension is covered by the project area. For instance, suppose that the user ®nds out that HOSPITAL is a class that already exists in the database, and that its speci®cation is adequate. Then, the user can browse the class contents to check if the appropriate objects have already been entered. Again, if LANDUSE has not been de®ned, the user can return to schema de®nition activities and specify the class. Figure 5 shows how the user can select the class HOSPITAL from the
52
DE OLIVEIRA, PIRES AND MEDEIROS
Figure 4. Using the environment to design a geographic application.
schema (left part of the picture) and then apply querying to its instances, using different types of predicates (right part of the ®gure). Even though this type of browsing is suf®cient for database browsing, it is not enough for combining analysis functions. UAPEÂ will let the user browse and view database contents, but does not act as an interface between the user and a GIS application (i.e., it does not replace the GIS). Data Loading. At this step, the user has to load additional data in the database. At the present implementation, data are entered into O2. In the future, the external drivers should provide translation between O2 and the underlying GIS structures. Modeling. Here, the user de®nes the desired information layers, and speci®es how their instances are created (e.g., by means of queries). The process model speci®ed that
Figure 5. Querying class instances.
AN ENVIRONMENT FOR MODELING AND DESIGN OF GEOGRAPHIC APPLICATIONS
53
InfoLayer instances were to be obtained by classi®cation (of geo-®eld objects) and overlays. For instance, a SOILCLASSIF infolayer class was de®ned by means of applying a classi®cation analysis function to soil instances of class SOIL, where the analysis used parameters de®ned by the process model. This infolayer has its geo-region de®ned by the application's ProjectArea. Information Integration. The instances of InfoLayers are created, as well as new data are derived to create instances of previously de®ned classes. In our example, SOILCLASSIF was a geo-®eld class and its single object enumerated the types of soil for the classi®cation function. During data loading, the user discovered that LANDUSE data sources were not available. Thus, the LANDUSE geo-®eld contents were derived in terms of VegetationCover and Industry instances. For instance, the ``industry''-type land use category was assigned to areas situated within a given radius of an Industry geo-object. Decision Taking. The user can now view the contents of the different InfoLayers, and combine them further in order to choose a solution. For instance, the model speci®ed that contagion is more critical in areas of high population density or where transit volume is above a certain level. Thus, the user can combine distinct InfoLayers by means of overlays and parameter weighing to obtain global scenarios. Decision taking is not supported by UAPEÂ. The user has to perform these activities by interacting directly with the GIS to produce scenarios, using the application designed through UAPEÂ, or invoking the GIS analysis functions on the database created by UAPEÂ. Part of this activity is dedicated to ®nding anomalies in the results, and identifying areas which are in need of speci®c solutions. In the example, the conventional objects of DISEASE were spatialized (by linking them to the ProjectArea) in maps (infolayers) showing incidences of diseases, for each year, in the area. These maps showed unexpected high incidence of contagious diseases in areas not predicted by the model (i.e., with high socio-economic level and good density of health services). This indicated problems in the model. (The main reason turned out to be that inhabitants of critical areas would go work in these non-critical areas during the day, thereby changing the expected contagion pro®le.) However, this mistake in the process model also pointed out the need for speci®c health policies in unexpected places. Policy Recommendation. Given the results obtained (several overlays), different suggestions were made to the county health administration authorities. These included, for instance, the creation of a special health education program in less favored areas. These suggestions were stored in a Document. 5.
Conclusions
This paper presented UAPEÂ, an environment geared towards helping end-users work with GIS tools. The environment is based on making available to the user a repository of
54
DE OLIVEIRA, PIRES AND MEDEIROS
reusable geographic entities and generates an intermediate data structure that can be converted, by means of specialized drivers, into the data model of the underlying GIS. One of the main goals of this environment is to create tools that help organize the design of geographical applications. The environment is based on two principles: *
*
geographic application design methodologyÐthe user is guided through application speci®cation and design using a methodology especially developed for geographic applications, which integrates data and process modeling; data modelÐthe user can specify and modify the geographic database according to a geographic object-oriented data model, which considers spatio-temporal data and different types of relationships among database entities.
UAPEÂ is being developed at ICÐNICAMP and a ®rst prototype uses the O2 DBMS. The application design and planning module and the browsing facilities of the query/browse module are already functional. The support for adequate geographic database querying facilities is still under way. At present, users can pose queries by selecting the desired classes and imposing textual predicates over the extents of these classes which are translated into O2 queries. We point out that UAPEÂ query facilities do not aim to allow users to perform complex analysis operations. At most, users may invoke methods over database objects. This is, of course, just one type of query that can be posed in a GIS environment. More sophisticated analysis processing would require replication of GIS functions inside UAPEÂ, turning it into a GIS of its own, which is not the goal of the project. In order to appropriately test this implementation, we are now designing the external drivers to allow coupling UAPEÂ to the SPRING GIS [33]. The coupling of UAPEÂ and SPRING is being helped by the fact that SPRING data model is a proper subset of our data model, which allows direct mapping to the GIS, decreasing complexity in driver building. As well, the query facilities of UAPEÂ are being extended in two ways: *
*
development of a natural language interface, where users' queries are translated into the GIS query language (LEGAL) [33]; support for spatial integrity constraints, converting the mechanism described in [40].
These two extensions are not general purpose, but depend on a speci®c query and analysis set of facilities. Performance measurementsÐanother extension to be considered in the futureÐhave yet to be properly speci®ed. Finally, UAPEÂ has been designed having in mind environmental applications. Most experiments we have conducted so far have been directed at the users of these applications. We envisage, in the future, extending it to other types of applications. In fact, the GMOD data model is independent of the application domain, but the design methodology is geared towards environmental applications. Thus, extending UAPEÂ in this sense would mean either modifying the methodology or providing another one for different application domains, notably AM/FM. This, in turn, requires improving the methodology advisor,
AN ENVIRONMENT FOR MODELING AND DESIGN OF GEOGRAPHIC APPLICATIONS
55
which at the moment is embedded in the data model layer, and should ideally become a separate module, possibly an expert system. Acknowledgments The work reported in this paper has been conducted with grants from FAPESP and CNPqBrazil, as well as the PROTEM GEOTEC project and EEC contract ITDC 116. We thank Prof. Rosely Sanchez of FEC-UNICAMP, whose help has been invaluable to validate the methodology and UAPEÂ, as well as Professors Jansle Rocha from FEAGRI-UNICAMP and Ardemiris Barros from FEC-UNICAMP, for their feedback and sharing their experience. The basic classes of GMOD are the result of many discussions with M. Casanova and A. Hemerly from IBM-Brazil, and G. Camara from INPE. References 1. D. Abel, S. Yap, R. Ackland, M. Cameron, D. Smith, and G. Walker. Environmental Decision Support Systems Project: an Exploration of Alternative Architectures for Geographical Information Systems. International Journal of Geographical Information Systems, 6(3):193±204, 1992. 2. G. Alonso and A. El Abbadi. GOOSE: Geographic Object Oriented Support Environment. In Proc. ACM/ ISCA Workshop on Advances in Geographic Information Systems, pages 38±43, 1993. 3. J. Antenucci, K. Brown, P. Croswell, M. Kevany, and H. Archer. Geographic Information SystemsÐa guide to the technology. Van Nostrand Reinhold, 1991. 4. F. Arcieri, E. Apolloni, L. Barella, and M. Talamo. HermesÐan Integrated Approach to Modeling Data Base Systems Design. In Advances in Database Systems: Implementations and Applications, pages 71±93. Springer Verlag Courses and Lectures 347, 1994. 5. S. Aronoff. Geographic Information Systems. WDL Publications, Canada, 1989. 6. M. Batty and Y. Xie. Modeling inside GIS: Part 2. Selecting and Calibrating Urban Models Using Arc-Info. International Journal of Geographical Information Systems, 8(4):451±470, 1994. 7. M. Batty and Y. Xie. Modeling inside GIS: Part I. Model Structures, Exploratory Spatial Data Analysis and Aggregation. International Journal of Geographical Information Systems, 8(3):291±308, 1994. 8. C. Beeri. Formal Models for Object-oriented Databases. In Proc. 1st International Conference on Deductive and Object-oriented Databases, pages 370±395, 1989. 9. M.A. Botelho. Incorporation of Spatio-temporal facilities in an Object-oriented Database. Master's thesis, UNICAMP, December 1995. (In Portuguese). 10. A.R. Brayner. Implementation of a Temporal Database using an Object Oriented Database System . Master's thesis, DCC-UNICAMP, April 1994. In Portuguese. 11. G. Camara, U. Freitas, R. Souza, M. Casanova, A. Hemerly, and C.B. Medeiros. A Model to Cultivate Objects and Manipulate Fields. In Proc 2nd ACM Workshop on Advances in GIS, pages 20±28, 1994. 12. S. Carver, I. Heywood, S. Cornelius, and D. Sear. Evaluating Field-based GIS for Environmental Characterization, Modeling and Decision Support. International Journal of Geographical Information Systems, 9(4):475±486, 1995. 13. N. Chrisman. De®ciencies of Sheets and Tiles: Building Sheetless Databases. International Journal of Geographical Information Systems, 4(2):157±168, 1990. 14. E. Clementini, J. Sharma, and M. Egenhofer. Modeling Topological Spatial Relations: Strategies for Query Processing. Computer & Graphics, 18(6):815±822, 1994. 15. J. Clifford and A. Tuzhilin, editors. Recent Advances in Temporal Databases. Springer Verlag, Sept 1995. 16. H. Couclelis. People Manipulate Objects (but Cultivate Fields): Beyond the Raster-Vector Debate in GIS. In
56
17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41.
DE OLIVEIRA, PIRES AND MEDEIROS
Proc International Conference on GISÐFrom Space to Territory: Theories and Methods of Spatial Reasoning, Springer Verlag Lecture Notes in Computer Science 639, pages 65±77, 1992. D. Cowen. GIS versus CAD versus DBMS: what are the Differences? In Introductory Readings in Geographical Information Systems, pages 52±61. Taylor and Francis, 1990. B. David, L. Raynal, and G. Schorter. GeO2: Object Oriented Contribution for a Geographic DBMS? In Proc. 4th International Conference Database and Expert Systems Applications, pages 377±383, 1993. B. David, L. Raynal, G. Schorter, and V. Mansart. GeO2: Why Objects in a Geographical DBMS? In Proc 3rd International Symposium Large Spatial Databases, pages 264±276, 1993. M. Egenhofer and D. Mark. Naive Geography. In Proc. COSIT'95, Springer Verlag Lecture Notes in CS 988, pages 1±15, 1995. S. Faiz. Modelisation et Visualisation de l'Information Qualite dans les Bases de DonneÂes Spatiales. PhD thesis, Universite Paris Sud, 1996. R. Fernandez and M. Rusinkiewicz. A Conceptual Design of a Soil Database for a GIS. International Journal of Geographical Information Systems, 7(6):525±540, 1993. A. Frank, I. Campari, and U. Formentini, editors. Theories and Methods of Spatio-Temporal Reasoning in Geographic Space. Lecture Notes in Computer Science 639. Springer-Verlag, 1992. A. Frank and M. Goodchild. Two Perspectives on Geographical Data Modeling. Technical Report 90-11, National Center for Geographic Information and Analysis, 1990. A. Frank and W. Kuhn. Specifying Open GIS with Functional Languages. In Proc SSD'95, pages 184±195, 1995. M. Goodchild. Integrating GIS and Environmental Modeling at Global Scales. In Proc GIS/LIS'91, volume 1, pages 117±127, 1991. M. Goodchild. Spatial Analysis with GIS: Problems and Prospects. In Proc GIS/LIS'91, volume 1, pages 40±48, 1991. M. Goodchild. The State of GIS for Environmental Problem-Solving. In M. Goodchild, B. Parks, and L. Steyaert, editors, Environmental Modeling with GIS, pages 7±16. Oxford University Press, 1993. M. Goodchild et al. Integrating GIS and Spatial Data Analysis: Problems and Possibilities. International Journal of Geographical Information Systems, 6(5):407±424, 1992. M. Goodchild, B. Parks, and L. Steyaert, editors. Environmental Modeling with GIS. Oxford University Press, 1993. O. Gunther and W.-F. Riekert. The Design of GODOT: an Object-oriented Geographic Information System. IEEE Data Engineering Bulletin, pages 4±9, September 1993. R. Guting. Gral: An Extensible Relational Database System for Geometric Applications. In Proceedings 15th VLDB Conference, pages 33±44, 1989. INPEÐInstituto Nacional de Pesquisas Espaciais and EMBRAPAÐEmpresa Brasileira de Pesquisa Agropecuaria. SPRING - User Manual, February 1993. In Portuguese. Z. Kemp and R. Thearle. Modeling Relationships in Spatial Databases. In Proc 5th International Symposium on Spatial Data Handling, pages 313±322, 1992. Volume 1. G. Langran. Issues of Implementing a Spatiotemporal System. International Journal of Geographical Information Systems, 7(4):305±314, 1993. D. Maguire and E. Dangermond. Geographical Information SystemsÐvolume I, chapter The Functionality of GIS, pages 319±335. John Wiley and Sons, 1991. D.J. Maguire, M.F. Goodchild, and D.W. Rhind, editors. Geographical Information SystemsÐvolume I. John Wiley and Sons, 1991. C.B. Medeiros and M. Botelho. Managing Time in GIS. In Proc. GIS Brasil, pages 534±544, 1996. (in Portuguese). C.B. Medeiros, M.A. Casanova, and G. Camara. The DOMUS ProjectÐBuilding an OODB GIS for Environmental Control. In Proc. International Workshop on Advanced Research in Geographic Information Systems (IGIS), pages 45±54, 1994. Springer Verlag Lecture Notes in Computer Science 884. C.B. Medeiros and M. Cilia. Maintenance of Binary Topological Constraints through Active Databases. In Proc 3nd ACM Workshop on Advances in GIS, pages 127±134, 1995. C.B. Medeiros and G. Jomier. Managing Alternatives and Data Evolution in GIS. In Proc. ACM/ISCA
AN ENVIRONMENT FOR MODELING AND DESIGN OF GEOGRAPHIC APPLICATIONS
57
Workshop on Advances in Geographic Information Systems, pages 36±39, 1993. 42. C.B. Medeiros and G. Jomier. Using Versions in GIS. In Proc. International DEXA Conference, pages 465± 474, 1994. Springer Verlag Lecture Notes in Computer Science 856. 43. P. Milne, S. Milton, and J. Smith. Geographical Object-oriented Databases: a Case Study. International Journal of Geographical Information Systems, 7:39±56, 1993. 44. I. Moreira. Origem e Sintese dos Principais MeÃtodos de AvaliacaÄo de Impacto Ambiental (Origin and Synthesis of the Most Important Methods in Environmental Impact Evaluation). In PIAB, editor, Monitoramento AmbientalÐEstudos. 1992. 45. T. Nyerges. Understanding the Scope of GIS: its Relationship to Environmental Modeling. In M. Goodchild, B. Parks, and L. Steyaert, editors, Environmental Modeling with GIS, pages 75±93. Oxford University Press, 1993. 46. J.L. De Oliveira and C.B. Medeiros. A Direct Manipulation User Interface for Querying Geographic Databases. In Proc. Int. Conference Applications of Databases, pages 249±258, 1995. 47. J.L. De Oliveira and R.O. Anido. Browsing and Querying in Object Oriented Databases. In Proc. 2nd International Conference on Information and Knowledge Management, pages 364±373, 1993. 48. J.L. De Oliveira and R.O. Anido. Integration of an Interface to Navigate in Different Object Oriented DBMS. In Proc. 13th Brazilian Computer Society Congress, pages 61±75, 1993. In Portuguese. 49. D. Peuquet. It's about time: A Conceptual Framework for the Representation of Temporal Dynamics in Geographic Information Systems. In Annals of the Association of American Geographers, 1994. 50. N. Pissinou, K. Makki, and E. Park. Towards the Design and Development of a New Architecture for Geographic Information Systems. In Proc. 2nd International Conference on Information and Knowledge ManagementÐCIKM, pages 565±573, 1993. 51. J.F. Raper and D. Livingstone. Development of a Geomorphological Spatial Model Using Object-oriented Design. International Journal of Geographical Information Systems, 9(4):359±383, 1995. 52. J.F. Raper and D.J. Maguire. Design Models and Functionality in GIS. Computers and Geosciences, 18(4):387±400, 1992. 53. D. Rhind, S. Openshaw, and N. Green. The Analysis of Geographical Data: Data Rich, Technology Adequate, Theory Poor . In Proc. 4th International Working Conference on Statistical and Scienti®c Database Management, Springer Verlag Lecture Notes in Computer Science 339, pages 427±454, 1988. 54. M. Scholl and A. Voisard. Building an Object-oriented SystemÐthe Story of O2, chapter Geographic ApplicationsÐan Experience with O2. Morgan Kaufmann, California, 1992. 55. T. Smith, S. Menon, J. Star, and J. Estes. Requirements and Principles for the Implementation and Construction of Large-scale Geographic Information Systems. International Journal of Geographical Information Systems, 1(1):13±31, 1987. 56. T. Smith, J. Su, A. El Abbadi, D. Agrawal, G. Alonso, and A. Saran. Computational Modeling Systems. Information Systems, 20(2):127±153, 1995. 57. L. Steyaert. A Perspective on the State of Environmental Simulation Modeling. In M. Goodchild, B. Parks, and L. Steyaert, editors, Environmental Modeling with GIS, pages 15±30. Oxford University Press, 1993. 58. M. Stonebraker. The SEQUOIA 2000 Project. IEEE Data Engineering Bulletin, pages 24±28, June 1993. 59. P. Story and M. Worboys. A Design Support Environment for Spatio-Temporal Database Applications. In Proc COSIT, Springer Verlag Lecture Notes in Computer Science 988, pages 413±430, 1995. 60. R. Subramanian and N. Adam. The Design and Implementation of an Expert Object-Oriented Geographic Information System. In Proc. 2nd International Conference on Information and Knowledge ManagementÐ CIKM, pages 537±546, 1993. 61. Surveys and Resource MappingÐBranch, Ministry of Environment, Lands and Parks, British ColumbiaÐ Canada. Spatial Archive and Interchange Format: Formal De®nition, 3.1 edition, 1994. Reference Series. 62. T. Vijlbrief and P. von Oosterom. The GEO++ System: an Extensible GIS. In Proc European GIS Conference, 1991. 63. A. Voisard. Designing and Integrating User Interfaces of Geographic Database Applications. In Proc. ACM Workshop on Advanced Visual Interfaces, pages 133±142, 1994. 64. M. Worboys. Object oriented approaches to georeferenced information. International Journal of Geographical Information Systems, 8(4):385±400, 1994.
58
DE OLIVEIRA, PIRES AND MEDEIROS
65. M. Worboys, H. Hearnshaw, and D. Maguire. Object-oriented Data Modeling for Spatial Databases. International Journal of Geographical Information Systems, 4(4):369±384, 1990. 66. M. Yuan and J. Albrecht. Structural Analysis of Geographic Information and GIS Operations from a User's Perspective. In Proc. COSIT, Springer Verlag Lecture Notes in Computer Science 988, pages 107±122, 1995.
Juliano De Oliveira MSc'93, UNICAMP, is a PhD student at the Institute of Computing, University of Campinas (UNICAMP). He was part of a team that developed the interface library for a complex GIS project for the Brazilian Telecommunications Research Center. His main interests are databases and user interfaces for GIS.
FaÂtima Pires MSc'80, UNICAMP, is a PhD student at the Institute of Computing, University of Campinas (UNICAMP). She is a senior systems analyst at UNICAMP's Computing Center. Her main interests are databases, software engineering and geographic information systems.
Claudia Bauzer Medeiros PhD'85, University of Waterloo, is an associate professor at the Institute of Computing, University of Campinas (UNICAMP), Brazil. She is the head of the Database Research Group and is the leader of a major research project on developing GIS tools and techniques. Presently she is the Editor of the Journal of the Brazilian Computer Science Society. Her interest include active databases, integrity control and geographic database systems.