Extending Geographic Data Modeling by Adopting ... - Semantic Scholar

3 downloads 840 Views 693KB Size Report
During the data capture process – e.g. by using mobile GIS in the field – of ... internal GIS software structures, therefore the user's interpretation of spatial .... Recently ECA rules have been used in many settings, like workflow management, .... GML application schema to comply with OGC Web Feature Service, which is used.
Published in: The European Information Society: Leading the Way with Geo-Information Lecture Notes in Geoinformation and Carography, Springer Editors: Fabrikant, Sara; Wachowicz, Monica 2007

Extending Geographic Data Modeling by Adopting Constraint Decision Table to Specify Spatial Integrity Constraints Fei Wang1 and Wolfgang Reinhardt1 1

AGIS – GIS lab, University of the Bundeswehr Munich, Werner-Heisenberg-Weg 39, D85577 Neubiberg, Germany {Fei.Wang, Wolfgang.Reinhardt}@unibw.de

Abstract. The rapidly growing use of geospatial data creates a great demand for a standard based GIS data model which also includes quality issues. Especially when datasets are shared for different purposes, the quality information becomes important and must be available to the data users. This paper proposes a method to extend geographic data modeling which allows for the consideration of quality information in a standardized way. In more detail it is shown how Constraint Decision Table (CDT) can be used to include spatial integrity constraints in the model. The information of the data model will conduct the following activities in GIS, especially during the data capture process, thus we also focus on integrating the quality information within the data capture workflow. For the integration we use OGC web services and discuss how they can be extended to provide the mobile GIS clients with these information. Keywords: Geographic data modeling, spatial integrity constraints, data quality, field data capture, Mobile GIS.

1

Introduction

As commonly understood in the “Geo-World” geographic data modeling is the process of selecting phenomena of the real world and organizing them in a spatial information system [7]. During the modeling process the modeling expert uses the requirements of the users to define the content and the details of the model. For the description of the model in a conceptual schema the Unified Modeling Language (UML) is widely used. Afterwards this conceptual schema normally is implemented into internal structures of a spatial database, and used in the context of a geographic information system. A “good” conceptual schema should contain a detailed description of the selected phenomena, which also includes the quality criteria and rules. We emphasize this

2

Fei Wang and Wolfgang Reinhardt

point because on the one hand according to the ISO modeling rules [22] quality principles information are described within the Meta Data model and on the other hand there is a lack of how the quality information and especially spatial integrity constraints can be described. Moreover the approach to separate the Model information and the Meta Data has more disadvantages because it requires separate methods and tools to provide users with Data and Meta Data [28]. But when many quality rules have to be considered, the UML conceptual schema will become overloaded and unreadable. Therefore we propose to extend the UML conceptual schema by means of the Constraint Decision Table (CDT) to include quality information, and spatial integrity constraints are focused on in this paper. It is obvious that the modeling process is a cooperative work between the domain or application expert and a modeling expert. Consequently in this paper we assume that the modeling expert directly uses this information and structures it in a form which is described below so that no specific user interface is necessary. During the data capture process – e.g. by using mobile GIS in the field – of course the model information has to be available. Focusing on a data capture process based on Open Geospatial Consortium (OGC) web services we also propose a way to extend the documents provided by these services to include quality information, respectively spatial integrity constraints in this case. Fig. 1 gives an overview on the whole process from conceptual data modeling to data capturing and finally the data storing in a remote database. The conceptual model in our case consists of a UML based schema extended by the CDT [42] to introduce quality information. The UML schema and CDT are then transformed into XMLbased documents to make the information available during the data capture process. The captured data according to the UML/CDT model finally is stored in a database, a step which is not treated in this paper. The structure of this paper is the following: After the introduction we review and discuss the relevant research. After that the CDT is explained, and based on it our proposal of an extended data model with consideration of quality information focusing on integrity constraints is introduced. This methodology also aligns with the ISO/TC 211 and OGC geographic information standards in order to ensure the model interoperability. Then it is followed by an overview on the data capture process in the field and how this concept coincides with it. To prove the context of this research, we give examples of how it can be applied in the geological/landslide field. Finally, we present our conclusions and an outlook.

Extending geographic data modeling to specify spatial integrity constraints

Fig. 1.

2

2.1

3

Geographic data modeling and geospatial data capture workflow

Extended Geographic Data Modeling

UML Conceptual Model and Spatial Integrity Constraints

In the early years data models developed for geographic applications were guided by internal GIS software structures, therefore the user’s interpretation of spatial phenomena was forced to be adjusted according to existing structures [8]. In general, the complex peculiarities of geographic data like spatial location, topological relation, temporal relation and etc. are not easy to handle even by modern data modeling

4

Fei Wang and Wolfgang Reinhardt

techniques. But also these well-known semantic and object-oriented data models, such as ER model or UML model do not offer adequate facilities to represent geographic data models with all required details. As a consequence, in geographic data modeling research people proposed different approaches to extend the existing modeling techniques for geographic applications [3], [8], [18], [19], [37]. It is also in a major trend that geographic data modeling follows international geographic standards of ISO/TC 211 and OGC to achieve model interoperability [6], [10]. A prominent research among them was given by [3], [5]. They introduced the spatial Plug-in for Visual Language (PVL) to extend UML aligning with ISO and OGC standards, and more interestingly it was implemented in a freeware Computer-aided Software Engineering (CASE) tool called Perceptory [4] which allows model developers to create the geographic conceptual schema complying with international geographic standards in a user-friendly way. A UML conceptual schema of three landslide area feature classes “Ditch, Extensometer and Road” created by Perceptory is given in Fig. 2.

Fig. 2.

UML conceptual schema example

However, difficulties arise when using UML to develop a conceptual schema which also represents the quality criteria like integrity constraints defined by the domain expert. As mentioned before, in order to ensure the data quality in the data capture workflow, the quality information has to be considered during the data modeling stage. Spatial data quality issues have been actively discussed in GIS history [21] and also ISO standards have been developed [23], [25]. But as mentioned before in the ISO standards the quality information is not included in the data model but in the Meta Data. In this research, we focus on the consideration of data integrity within the data model. An approach of using spatial integrity constraints and applying them at the data entry to ensure the data quality has been accepted in GI science [9], [15], [16]. Cockcroft [14] summarized the traditional database integrity constraints: transaction integrity constraints, static constraints and transitional constraints. She also introduced three new ones according to the characteristics of spatial data: topological constraints, semantic constraints and user defined constraints. In this paper we will consider topological and semantic constraints.

Extending geographic data modeling to specify spatial integrity constraints

5

Then the question comes up to how to apply the spatial integrity constraints during the data modeling stage. Most of the mentioned modeling methods involve constraints, but are very limited to implicit associations and attributes [19]. Although the OMT-G in [8] allows for a spatial integrity constraints definition in the conceptual schema, it is still difficult to keep the model compact and easily understandable when there are a large amount of complex constraints. Another widely used approach is using OCL in combination with UML to define the constraints [11]. OCL as the natural cooperative language of UML has powerfully expressive abilities to define constraints, but the textual description of the OCL constraints severely reduces the human/computer readability of the UML conceptual schema. Especially when many complex integrity constraints have to be considered, the UML conceptual schema may become overloaded. Therefore, a new methodology is needed to solve those problems. 2.2

Constraint Decision Table

To overcome the problems mentioned above we suggest a new method, namely to use the UML for modeling the “normal” information of the geographic data, like the geographic entities, their properties and simple relationships, to describe the specific spatial integrity constraints in another way, and to combine them adequately. A similar idea was mentioned that a constraints repository was used to contain details like complex constraints [5]. But in this work it was not explained how to define the constraints and the structure of the constraints repository, and moreover they didn’t consider the subsequent transfer of the information to provide it for example to the data capture. In order to define such a spatial integrity constraint, the association of the geographic objects has to be explicitly given. Ubeda [39] proposed a structure for topological integrity constraint as: “CONSTRAINT = (Entity class1, relation, Entity class2, Specification)” It defines the association of two geographic objects, a topological relation between them and a specification (including “Forbidden” and “At least n times, At most n times, Exactly n times”). Cockcroft [16] extended the approach to allow the consideration of the topological relation on the basis of attribute values. Even so, our field data capture experiences show it is not enough to solve the real world problems, because both of them didn’t consider the consequent problems when the constraint is violated. The field user always needs some guidance to deal with the violation of the constraints, where the guidance information is given by domain expert or described in product specifications. For example, in a landslide application a constraint might be defined as follows: “The intersection relationship between a Ditch object and a Road object is forbidden”. When the field user meets with this situation in the real world, he knows this case should not happen but he doesn’t know explicitly how to react on it. Therefore, providing clear instructions to the field user is very important for the quality of the collected data.

6

Fei Wang and Wolfgang Reinhardt

Pullar [32] suggested to use a rule-based “if-then” structure in a decision table to define the decision rules, which shows advantages such as compactness, selfdocumentation, modifiability and completeness checking. Even though the decision table is able to solve the problem of the previous real world example by including instruction information, still not all requirements are fulfilled. The following two examples from a landslide application illustrate that there is a need to explicitly define actions which have to be performed in specific cases and which give a detailed instruction what has to be done. The first one: if way object like a hiking trail needs to be added, the topological relationships between the hiking trail and the surrounding ditch objects have to be checked in order to find out whether the ditch affects the safety of the trail; The second event: when we want to delete one hiking trail from the database, we don’t need to consider its topological relations with nearby ditches, but to concern more on whether the transport network is damaged or not. Therefore, we put forward the Constraint Decision Table which extends the “ifthen” structure to an Event-Condition-Action (ECA) rule. The ECA term is sometimes also named as monitor, or situation-action rule or trigger in active databases [44]. It is a very often used term since it was proposed within the active database community for monitoring state changes in database systems. An ECA rule consists of events, conditions and actions and has semantics: “When an event occurs, check the condition and if the condition is satisfied, then execute the actions”. Recently ECA rules have been used in many settings, like workflow management, network management, personalization and publish/subscribe technology, complex software analyzing and specifying and implementing business processes [2], [44]. And the ECA rule has been already put into the schedule of the RuleML (Rule Markup Language) standardization initiative [35], who is contributing to the implementation of the web rule language framework. In our approach, its usage will be beyond the active database systems and contribute to the spatial integrity constraints formulation and other information of data quality control process. The CDT (see Fig. 3) contains three main parts “event, condition and action” according to ECA rule and the description and the specification of each part. In the following we adopt the CDT to spatial integrity constraints. The semantic meaning of the terms used in the CDT are defined and grouped as follows, but not limited to them:

Fig. 3.

Constraint decision table structure

1. Operation: the following operation terms are defined to express specific actions for geographic data.

Extending geographic data modeling to specify spatial integrity constraints

7

− − − − − 2. −

updateGeometry -update the geometry of an existing feature. addFeature - add a new feature to the datasets. deleteFeature - delete an existing feature from the datasets. setAttributeValue(AttributeName) - set a value to a given attribute name. getAttributeValue(AttributeName) - get the value from a given attribute name. SpatialRelationship: we consider three kinds of spatial relations: Topological relations: in this research, we only emphasize on topological relationships of the simple features defined by OGC simple feature specification for SQL [30]. In order to give the named spatial relationship predicates, we adopt seven predicates: disjoint, touches, crosses, within, overlaps, contains and intersects [12], [13], [30]. − Metric relations: it is defined in terms of distance and directions. distanceTo, directionTo [31]. − Relations about partial and total order of spatial objects: partOf [26]. 3. Description: it includes the contents of event, condition and actions like operations, spatial relationships. 4. Specification: it shows the results of the expressions in description column, for instance, True/False, or a value with a mathematic operators like

Suggest Documents