GeoInformatica 6:4, 363±380, 2002 # 2002 Kluwer Academic Publishers. Manufactured in The Netherlands.
Field Data Collection with Mobile GIS: Dependencies Between Semantics and Data Quality HARDY PUNDT University of Applied Sciences Harz, Faculty of Automation and Computer Science, Friedrichstr. 57-59, 38855 Wernigerode, Germany E-mail:
[email protected] Received September 6, 2000; Revised June 4, 2002; Accepted July 1, 2002
Abstract Field work is needed in many scienti®c disciplines as well as practice, e.g., surveying or environmental monitoring. Despite the goal of making the data collection process more effective, mobile geocomputing tools are a means to control data quality during data collection. Such tools must consider the conceptual data models of real world features that are developed in speci®c spatial information communities. Mobile GIS tools can support data quality through functions that control simultaneously the data entered by ®eld workers. Semantic integrity of the database can be achieved through semantic plausibility controls, i.e., rules implemented in a knowledge base that help avoid the occurrence of inconsistencies. Such knowledge based functions must take into account the dependencies that exist between data quality and data semantics. Exemplarily, such dependencies are described in this paper as well as the knowledge based functions that are integrated as Dynamic Link Libraries into a mobile GIS. The examples demonstrate the strong application dependency of data quality and raise the question of how to integrate information about speci®c quality requirements into data models, e.g., for the purpose of multiple data use. The use of data modeling languages to achieve comprehensive data quality descriptions that consider adequately the dependencies between data quality and semantics is proposed and some hints on potentially useful linkages with other techniques to describe data and their semantics are given. Keywords: mobile GIS, semantics, data quality
1.
Introduction
Environmental and socio-economic investigations often require large amounts of data. Apart from existing data that can be acquired using data mining techniques, the data are often collected during ®eld surveys. Increasingly, ®eld data collection is carried out by means of pen- or wearable computer tools providing digital topographic and thematic maps as well as input masks for attribute information [29]. Using digital maps and (D)GPS measurements new point, line or polygon objects can be created. Attribute information is added to such geometries and transferred to other systems, GIS like ArcView, MapInfo, or SICAD, for instance [47]. GISPAD is an exemplary application for ®eld data acquisition that provides such functionalities [23], [40]. Many mobile data collection systems exist meanwhile and the most pressing technical obstacles have been removed in recent years: ``A fully digital data ¯ow from data acquisition to a geographic database is desirable. It will reduce the error sources, save time and in the end there is a possibility to share data
364
PUNDT
with others in an easy way and the maintenance of the database is simpli®ed'' [5]. Current mobile applications have shown that this citation is only partly correct [39], whereas the information sharing problem still causes open questions. Semantic non-interoperability is an issue that refers to mobile GI-services in particular. Every application and repository has its own data dictionary, its own well-de®ned terms and concepts, and its own mechanisms for capturing taxonomy and classi®cation hierarchies [17]. This has been identi®ed by leading organisations such as the Open GIS Consortium [36] and several scientists: ``. . . the semantic issues come to the forefront'' [46]. For a long time, one focus of research has been on object catalogs and feature de®nitions used in speci®c geospatial information communities (GICs), together with activities to establish Spatial Data Infrastructures [20]. On the geodata market place a wide variety of datasets occur and comparing information across disciplinary and administrative boundaries is dif®cult [2], [32]. Semantic heterogeneity causes problems concerning the sharing and multiple use of data and therefore for the use of mobile GI Services. Such services are mostly developed by special GICs, however, they are necessarily enclosed in the speci®c world view of such communities. Field data collection supported by portable or wearable computers enable the inclusion of functions that support the improvement of data quality during data collection outdoors. Such an improvement can be achieved through semantic plausibility control. Semantic plausibility control must necessarily be based on GIC speci®c knowledge [41]. Being enclosed in a speci®c application for ®eld data collection of a speci®c information community it is hardly usable by another GIC that owns a different world view and differing requirements concerning data quality. The quality of a dataset, however, depends on the intended and actual use of the dataset [24]. Fitness for use of data means that the data and the functions working on them must meet speci®c requirements of a GICA . It aims to supply enough information upon various error characteristics of a dataset to allow the data user to come to a reasoned decision about the data's applicability to a given problem in a given situation [13]. If the feature de®nitions on which the functions, such as plausibility control, were developed, are too different from those used in a different GICB , the functions are not usable for GICB . This represents what is referred to as dependency between data quality and semantics: Data quality controlling functions, used in one GIC cannot be used in another due to the meaning of the data that has been ®xed differently in the two GICs. Such dependencies between data quality and semantics are still not adequately addressed in the design of GIS, especially mobile tools that should in particular show a high degree of interoperability. This is somewhat surprising when we remember that a data model consists of three components, a collection of object types, a collection of operators, and a collection of integrity rules [12]. The rules that express the plausibility controls (see also Section 3), can be considered as such integrity rules. They must be part of the data model. In current spatial data models, such rules refer mostly to topological and geometric consistency and constraints [8]. Semantic integrity rules are lacking in many cases or considered rather scarcely. The challenge is how to deal with the goal of providing functions that guarantee data quality within mobile data collection systems on the one hand, and the dependencies between data quality and semantics on the other hand. If ®eld collected data will be
FIELD DATA COLLECTION WITH MOBILE GIS
365
provided to a wider range of users, these must get knowledge about the semantics and the quality constraints that underlie the data collection methodology. Are users able to assess data's ®tness for use, if they lack such knowledge? A ®rst step toward answering this question is to provide the user with detailed information about data semantics and the dependencies, both being speci®c to an information community. The classic data quality elements (see next section) are not enough, they do not consider data quality in a comprehensive manner. Standardization and metadata fail in many cases, due to the problems that arise from the use of terms, namely synonymy and polysemy, and the lack of de®ned criteria that describe data semantics and GIC speci®c semantic quality constraints adequately. Due to these facts we must probe the assumption that data quality descriptions should not refer to the production method or that it is only operational if such descriptions do not depend on human interpretation, as it was formulated in [15]. Opposite to this argument is the view that quality is necessarily contextual and that a closer linkage of metadata and context world would improve the usability of quality information in decision making [21]. But many existing GIS software contains few, if any, tools for the handling of metadata, and even where operations on datasets (e.g., automatic plausibility control) are logged this information is not always directly available to the user [31]. This leads to the question of how quality and quality constraints can be considered within mobile applications, aiming at giving data users enough information to interpret the data adequately? Aiming at the proposal of answers to the questions mentioned before the paper proceeds with focusing on speci®c issues of data quality and semantics. Examples are presented in the third section. They highlight the dependencies between semantics and data quality by presenting knowledge based tools for mobile GIS that are aimed to control data quality during data acquisition. Within this chapter, some remarks about performance studies and the bene®ts of the inclusion of semantic plausibility control and additional knowledge based functions are given. The fourth section describes an approach how to model the dependencies and presents a technique to formalize models using the resource description framework (RDF). The conclusions in the ®fth section lead to challenges for future research. The examples describe concretely how data quality can be improved through domainspeci®c, and context-sensitive knowledge based functions that are integrated into a mobile GIS application. This application supports the digital collection of environmental data in outdoor situations. Using such an application means that new datasets are created under explicit consideration of data semantics. Chapter 4 focuses on how to provide information about data semantics and data quality constraints to users of ®eld collected data. The problems of how to extract semantics from existing data (data mining processes) are not considered in this paper. 2.
The ``famous ®ve'' and additional quality criteria
The International Cartographic Association (ICA) has conducted a study that was aimed at getting more knowledge about data quality and quality requirements at 288 National
366
PUNDT
Mapping Agencies [38]. An interesting aspect of the ICA-questionnaire was that the quality assurance routines were focused on ``checking printouts of consistency checks'' (79%) and ``subjective evaluation'' (26%) [38]. This points at the dif®culty to handle quality elements related to semantics using quantitative methods. Subjectivity plays an important role, which may be interpreted as a consequence of the dependency of quality and semantics. There exist several well known criteria to describe the quality of spatial data. They have been entitled the ``famous ®ve'' by [13] and are namely: lineage, positional accuracy, attribute accuracy, logical consistency, and completeness [18], [24], [35], [37]. Standards for the speci®cation of data quality and quality assurance have been developed by various national and international organizations, such as ISO, ANZLIC, or CEN to mention only some [1], [11], [24]. The widespread acceptance of these quality elements does not necessarily bear any relation to their suitability for the task to describe the quality of any data set so as to allow a user to assess ®tness for use. However, up to now little empirical work exists to determine how well the ``famous ®ve'' perform in different scenarios and so their applicability to all situations is open to question [13]. Reasons of applicability forced Salge to add another element to describe data quality, semantic accuracy. Semantic accuracy describes the number of features, relationships, or attributes that have been correctly encoded in accordance with a set of feature representation rules. Related to the meaning of the ``things'' of the universe of discourse (the reality), semantic accuracy refers to the pertinence of the meaning of the geographical object rather than to the geometrical representation [43]. Salge describes in detail which issues have to be considered for semantic accuracy, e.g., the distinction of the real world, the perceived world, and speci®cations as well as conceptual schema of data. An observer working with mobile GIS for data collection maps the ``things'' of a speci®c universe of discourse. These ``things'' can be the result of measurements with some kind of instrument, such as a thermometer or earth observing satellite, it can be ®eld observations, and some, if not all, data collected outdoors might be the result of human interpretation. Additionally, the observer records attributes that often are uncertain. This can be an uncertainty of de®nition, for example if two interpreters of the same land use database might assign different classes [19]. Many of these aspects have a semantic component and we have to look for pathways that deal with these semantic issues adequately. To bridge to the next sections two questions arise: 1. If there are no computerized measures to asses the semantic accuracy of a spatial data set one must interpret it from the viewpoint of the GICA that produced the data. This makes it dif®cult for other GICsn , potential users of the same data, to evaluate whether the data are ®t for their use or not. How can this de®cit be removed? 2. Problems as such mentioned before [19] are well known but rarely considered within the framework of mobile GIS for data collection. Which consequences occur for the multiple use of ®eld collected spatial data? One approach to support ®eld workers during ®eld data collection is the knowledgebased capture of data. The automatic plausibility controls mentioned before enable the
FIELD DATA COLLECTION WITH MOBILE GIS
367
user to correct errors in the ®eld, thus guaranteeing a higher quality of data. This is especially true concerning semantic plausibility checks [40]. They can be implemented in ®eld GIS to warn the observer if semantic inaccuracies occur in the database. This approach comes close to a tool that supports the semantic accuracy of a dataset and thus ®lling a gap mentioned by Salge [43]. Knowledge based tools for the diagnoses of special natural conditions are other means that improve data quality. In the following section such modules are described including some remarks on the performance and the bene®ts that occur with their usage. These functions are attempts to handle both, data quality and semantics, in mobile GIS due to the fact that they are not independent from each other.
3.
Digital ®eld data collection: support through knowledge based GI-services
For the prototypical implementation of knowledge bases to be integrated into a ®eld GIS two options occurred: ®rstly, plausibility checks to control the semantic accuracy of the database underlying a mobile GIS application and secondly, modules for the diagnosis of speci®c natural conditions that have to be assessed ecologically in concrete outdoor situations. This example is from river habitat surveying, a task carried out on a scale level that often requires intensive ®eld work. Habitat surveying requires ecological assessments, and therefore detailed river habitat data have to be captured. Semantic plausibility checks for a speci®c method for river habitat surveying were implemented exemplary under PDC PROLOG for Windows and integrated into a mobile GIS software for data collection as dynamic link libraries [40]. They represent an example for linking spatial data within a GIS tool and application speci®c knowledge [33].
3.1.
Data collection supported by domain-speci®c, knowledge based functions
For river habitat surveying, the data collection method that builds the basis for the mobile application requires the capture of more than 30 attributes that are used to describe the ``naturalness'' of a river. The attributes and their values are de®ned in the mapping instructions [47]. On the basis of such de®nitions, developed by experts of the German ``State's Commission on Freshwaters'', it was possible to formulate and implement rules that control simultaneously the data collection procedure during ®eld work. The plausibility checks, however, warn the ®eld worker immediately if semantic inaccuracies occur. An exemplary rule that deals with semantic accuracy of a speci®c attribute (``curve erosion'') and activates a warning window on the screen of the mobile computer is as follows [40]: IF AND
the river is mostly strengthened arti®cially there exist very few or no banks along or across the river bed
368
PUNDT
THEN the curve erosion of the river cannot be sporadic (or frequent) and intense To activate a warning window, this is transferred to Prolog as follows: Curve_erosion (``Warning_Window'', ``Warning: Curve erosion can't be sporadic (or frequent) and intense, if river is strengthened and no banks are existing along or across the river bed''):dbs([``river'', ``strengthened arti®cially'', `` 4 50%'']), dbs([``river'', ``banks'', `` 5 10%'']),. The plausibility check can occur during data acquisition while entering data about the attribute ``curve erosion of the river''. As a result the ®eld worker can immediately reobserve or re-think the situation and correct the database. Another opportunity to support data quality is to provide tools that process a knowledge base and additional information from the user to produce a diagnosis. Such a knowledge base was developed for the same attribute ``curve erosion of the river''. The aim is to produce a diagnosis of the current state of this attribute. This can be achieved if speci®c criteria are aggregated by means of rules. Figure 1 shows a graph that describes the way of how speci®c criteria are aggregated. As a result of a consultation of the knowledge base users get an automatic diagnosis of the attribute that they have to assess in the outdoor situation. An exemplary diagnosis can be In this section sporadic and weak curve erosion of the river dominates.
Figure 1.
Criteria and aggregation process to produce a diagnosis for a speci®c river habitat attribute.
369
FIELD DATA COLLECTION WITH MOBILE GIS
Table 1. Exemplary attribute and its classi®cations within a speci®c river habitat surveying method. Attribute
Classes
Curve erosion of the river
1. 2. 3. 4. 5.
none sporadic and weak frequent and weak sporadic and intense frequent and intense
This diagnosis refers to a classi®cation proposed in the mapping instructions that contains the classes shown in table 1. The advantage of using a knowledge base is, apart from the concrete suggestion of a class (see table 1) for a speci®c attribute, that the derivation of the result can be documented automatically, and completely. Both types of knowledge bases, the tools for plausibility control, and the diagnosis, allow the observers to reconsider the outdoor situation and ®nd reasons, (1) why semantic inaccuracies occurred in the database or (2) how the part of the landscape they are observing has to be assessed adequately. They can correct the database seeing the reality, not later in the of®ce where it is very dif®cult to bear all relevant facts in (human) memory. The effect concerning data quality refers to the fact that the user is forced to take into account the whole model to carry out a speci®c diagnosis. This produces a qualitatively better result in comparison to ad hoc decisions in outdoor situations, that sometimes fail. Such knowledge bases are an approach that is clearly different from those aiming to give information about the quality of data to the user through * * *
Textual or graphical information The calculation of indicators or Metadata
Instead of such descriptions, the knowledge bases are dynamically integrated into the mobile GIS and interrupt directly the data collection process, when inconsistencies occur. This helps to avoid errors and logical mismatches instead of describing their potential occurrence. 3.2.
Performance and usability aspects
The motivation for the development of the tools mentioned before has been the idea that identifying semantic errors while being in the ®eld can save time (and therefore money) because such errors can be avoided and corrected immediately, which can be seen as an usability requirement of such a tool [39]. The approaches follow the goal to put quality information at the disposal of the end user in order to be taken into account during later decision-making [14].
370
PUNDT
The goal of the knowledge bases is to support the ®eld workers during the work they are currently doing and to guarantee a de®ned, GIC dependent, quality standard of the data. Only if users agree with such GIC dependent arrangements, should they use the tools. If such an agreement is not achievable, the data will not ®t their purposes. Tests have been carried out using the prototypes described before in various data collection procedures [39], [40]. The results have shown that the ®eld work took a slightly increased amount of time. At ®rst glance, this is surprising. But it becomes obvious when we take into account that the ®eld worker was enforced several times to correct the database due to warnings of the semantic plausibility control, or the usage of the diagnosis functions. The bene®t of such a knowledge based data collection system comes from the usage of such functionalities, because the quality of the data was considered more positively by the users. They could rely on the data after the ®eld work in terms of semantic accuracy and other criteria. This means also that it was not necessary to get to the ®eld twice because of inconsistencies in the database that were detected many days (or weeks) after the ®eld work. From this point of view, the tool was evaluated as more effective than the traditional collection of data with paper forms or pure data input masks without any user support. The performance of the GIS tool did not suffer from the additional functions because they are integrated into the given application [23] as speci®c, dynamic link libraries (DLLs) implemented under Prolog [40], which causes no processing problems on today's portable computers. The automatic control of data quality during outdoor data collection is a process that still needs human intervention to a certain extent: the DLLs that perform the semantic plausibility checks warn the user if inconsistencies occur in the database that is evolving. But it is up to the decision of the user if he or she accepts the warnings, or not. This is due to the fact that natural systems (in this case river habitats) are individual and every datum collected must be seen in the speci®c ecological context that can change on small (spatial) distances several times. This is also true concerning the knowledge bases that support diagnosis: if the user does not agree with the results produced by the knowledge base he can decide in another way, but must give reasons for his personal decision. An outcome is that all this knowledge based functions support the ®eld worker, and help him to carry out the data collection process in a solid and reliable manner. The whole process of data collection can only be automated to a certain extent, because of the speci®c semantic constraints and the speci®city of the knowledge to be processed. 3.3.
Some additional remarks on data quality and semantics
The data model and quality constraints refer to the dependencies between semantics and data quality, and the term ``semantic plausibility control'' might indicate the connection of both [40], [41]. Such dependencies are of various kinds. The meaning of a word or a sentence depends as much on who utters it, and where, when, and to whom it is uttered [25]. Referring to a spatial example from hydrology, a ``river'' may include the bank strip and the ¯oodplain in an ecological information community, whereas it is reduced on the
FIELD DATA COLLECTION WITH MOBILE GIS
371
water bed in an engineering information community. ``Curve erosion'' may be seen as a disturbing factor by engineers, because erosion requires arti®cial fastenings, whereas it could be a desirable factor for ecologists, because curve erosion leads to the development of habitats for plants and animals. The term ``plant'', however, can have multiple meanings that have to be represented differently [25]. Cultural and linguistic aspects can be mentioned additionally. The meaning of a term like ``lake'' can be interpreted differently not only between information communities within a country, but especially in an international context. Mark [30] gave an example and underlined that a ``lake'' (English), a ``lac'' (French), a ``lago'' (Spanish) and a ``See'' (German) are not the same. Others, such as [22], [42], discuss case studies that show exemplary problems occurring as a consequence of this kind of uncertainty. Figure 2 shows various tools aimed at the support of the ®eld observer. Most of them show clearly the application dependency, similar to those tools described before. The ®gure underlines the necessity to look with more emphasis on such dependencies and their handling within mobile GIS. The examples given bridge to the next section that arguments for the need of data models that take into account explicitly semantics and data quality constraints. We have to look for solutions that are suitable to inform users about such dependencies that are part of the GIC speci®c data model. We will ask if such dependencies can ®nd entrance into the data modeling and data communication process. Ontologies are considered as an approach that could help to include information about the dependencies between data quality and semantics into comprehensive, semantically enriched data descriptions.
Figure 2. Tools to support digital ®eld data acquisition and their semantic dependency from the speci®c worldview of a GIC.
372 4.
PUNDT
Toward semantically enriched, and communicable spatial data models
Sharing of spatial data and services within and between GICs requires the proper consideration of the dependencies described before. This is especially true for concrete GIS applications such as the ®eld data collection tool mentioned before. Metadata are the usual means to provide information about GIC speci®c semantics. Unfortunately current metadata, if available at all, do not take into account such dependencies adequately, they are mostly reduced on short descriptions of the classic quality parameters (the ``famous ®ve'', see Section 2, and for example, the ``Core Metadata Elements'' in [1]). This is close to the criticism that one root of ineffective use of existing data is the lack of clearly de®ned metadata for the datasets [34]. ``Abstractive quality'' has been presented as ``data quality which supports or informs the linkage between the real world and the terrain nominale'' [13, p. 57]. This comes close to the de®nition of ``semantic accuracy''. The interesting point in the de®nition of ``abstractive quality'' is that the information aspect is included. Users must be informed about semantics and semantic constraints of spatial data, if they want to assess their ®tness for use. Data quality as it is currently handled is not readily accessible to the user [3]. The question is, how to achieve this using another approach than the classic metadata descriptions, one that more explicitly takes into account the context, in which quality constraints occur. 4.1.
Ontology-based models
If multiple use of ®eld collected data is envisaged it needs techniques that enable the potential users to evaluate if they can use the data, for their speci®c tasks, or not. Ontologies are meant to be one of the ``methodologies'' that could close the existing gap. Ontologies are designed to capture the assumptions and intended meanings of the concepts and statements in a particular domain [7]. Such a domain can be river habitat surveying, for instance. Ontologies can serve as vehicles to capture the quality requirements of one or more GICs. If such quality requirements are de®ned as properties of spatial features within an ontology, the spatial features modeled by using ontologies are consistent within a users domain. Data modeling based on ontologies is a helpful approach that supports the goal of spreading comprehensive information about GIC speci®c data among users. The Internet, as an example for a means to distribute information to as most users as possible, requires the consideration of speci®c requirements about how to represent spatial data and describe them. Basically, there are two scenarios for the application of ontologies [26]: *
*
The ``neutral authoring'' scenario where the ontology is developed at a ``neutral'' site and then used from different (knowledge based) systems The ``ontology as speci®cation'' scenario which describes an ontology of a speci®c part of the world (e.g., a river and its catchment) Within this investigation the second scenario is preferred. Domain ontologies are
FIELD DATA COLLECTION WITH MOBILE GIS
373
speci®cations of the conceptualizations which have been developed by speci®c GICs. The search for a language that is usable to describe such ontologies led to the RDF. RDF is still under construction at the W3C but is supposed to become a standard [48]. With the help of RDF it is possible to describe the conceptual data models, respectively domain ontologies, used in certain GICs. RDF/Schema is a language that supports data retrieval and description in Web-based applications [48]. RDF/Schema is furthermore usable to describe documents that contain the data models under consideration of their semantics. Every RDF based model contains features described as resources. The resources are things which might be assigned an uniform resource identi®er (URI). They are instances of a basic Class. This refers to a linkage between RDF presentation of features, their formalization by using RDF/Schema and the object-oriented paradigm. Within the latter, classes encapsulate not only the data and metadata but also the methods working on the data. In RDF, classes can be represented hierarchically by using graph representations as shown in ®gure 3. If a class is a subset from another, then there is an rdfs:subClassOf arc from the node representing the ®rst class to the node representing the second. Figure 3 shows an RDF data model that refers to the elaborations of details in Section 3. If a resource is an instance of a class, then there is an rdf:type arc from the resource to the node representing the class. Classes in RDF can have several properties, some of them prede®ned [48]. Several properties can be used to describe the characteristics or attributes of other Web resources. RDF statements have three components: they describe a URI named property of some (URI named) web resource, and give the value of that property either as another (URI named) resource or as literal string such as ``curve erosion'' or
Figure 3. RDF based data model for ``curve erosion'' including some properties and their values, resulting from a habitat survey.
374
Figure 4.
PUNDT
Prede®ned properties in RDF.
``The University of Applied Sciences Harz of Wernigerode''. Some properties can be used in many contexts, others make sense only when applied to some sub-category of resources or when the value of the resource takes some constrained form [9]. Examples for such prede®ned properties are glossary, help, appendix. Further properties are isDe®nedBy, comment, or seeAlso. They are represented in ®gure 4 and pursue the examples in Section 3. The description of a resource using RDF/Schema has two goals: 1. The resources containing information about an object, e.g., a river, are identi®ed. 2. The resources are described themselves, the object classes, objects and properties as well as relationships between them. This is done by means of properties (®gures 3, 4). Table 2 shows the speci®cations de®ned by the W3C for some prede®ned, RDF conform properties, some of them represented in ®gure 4. According to table 2 there are various opportunities to include information within RDF descriptions that could be used to support the interpretability of data. This would be additional information that in detail describes classes, objects and properties occurring in a RDF diagram. As a ®rst attempt one could use comment, for instance, to describe the Table 2. RDF properties and their speci®cations. IsDe®nedBy SeeAlso Comment Glossary Help Appendix
indicates a resource containing and de®ning the subject resource indicates a resource that provides information about the subject used for human language descriptions of the resource refers to a document providing a glossary of terms that pertain to the document refers to a document offering help (more information, links to other sources of information, etc.) refers to a document serving as an appendix in a collection of documents
FIELD DATA COLLECTION WITH MOBILE GIS
375
resource itself, and additionally to mention quality constraints. Referring to the example of a semantic plausibility check in Section 3 such a constraint could be represented by carrying forward the rules that were already implemented in the mobile GIS under Prolog. To support users that are often non-experts in programming languages, it makes sense to use natural text to describe such rules. The exemplary RDF/Schema code, very similar to XML, is as follows (note that this is an example and there is currently no known freshwater commission or similar organisation that has set up an RDF/Schema code like this): 5rdf:RDF xml:lang ``en'' xmlns:rdf http://www.w3.org/1999/02/22-rdf-syntax-ns# xmins:rdfs http://www.w3.org/TR/1999/PR-rdf-schema-19990303#4 5rdf:Description ID ``curve erosion of the river''4 5rdf:type resource ``http://www.w3.org/TR/1999/PR-rdf-schema-19990303#Class''/4 5rdfs:subClassOf rdf:resource ``http://www.hydrolg.org#river''/4 5rdf:type resource ``http://www.w3.org/1999/02/22-rdf-syntax-ns#Property''/4 5rdf:isDe®nedBy4 Freshwater Commission 5rdfs:comment4 Data Quality Constraints: Rule No 3: (. . .) Rule No 4: Explanation: Rule No 4 explains the dependency between arti®cially in¯uenced freshwaters and the occurrence of morphological structures Background: The mapping instructions for river habitat surveying of the State's Commission on Freshwaters (1999) De®nition: IF
``the river is strengthened arti®cially''
AND THEN
``no banks along or across the river bed'', ``the curve erosion of the river cannot be sporadic (or frequent) and intense''.
Rule No 5: (. . .) 5/rdfs:comment> 5/rdfs:seeAlso rdf:resource ``../OtherRiverHabitatSpeci®cMetadata.rdf''/4 5/rdfs:seeAlso> 5/rdf:Description> 5/rdf:RDF>
The RDF extension OIL provides possibilities to de®ne such rules directly within RDF. An example is given in [10] and has been modi®ed as follows: 5rdf:Description xmlns:syllogism http://old.Greece/syllogism/''4 5rdf :type rdf :resource ``http://www.ontoknowledge.org/oil/rdfschema#RuleBase''/4 5syllogism:premise4 if river is strengthened arti®cially, and no banks along or across river bed, curve erosion cannot be sporadic (or frequent) and intense
376
PUNDT
5/syllogism:premise4 5syllogism:fact4 river strengthened, and no banks 5/syllogism:fact4 5syllogism:conclusion4 curve erosion cannot be sporadic (or frequent) and intense 5/syllogism:conclusion4 5/rdf:Description4
RDF enables the user not only to ®nd data sources (see the type resource statement) but also to understand their meaning. If the RDF-based graphical approach (®gures 3, 4) and the descriptions implemented via RDF/Schema are provided via the Internet to the user, he or she gets a handy support to assess the ®tness for use of the ®eld collected datasets. The RDF-description can furthermore hint on other Internet pages where ``curve erosion'' or related terms are described. Such means lead to the comprehensive data descriptions that were mentioned before. Taking into account that RDF provides several more capabilities [48] this seems to be a way toward semantically enriched data models that should be followed up more intensively in future. This approach, however, should be seen as a contribution to answering the question ``how should (. . .) geographic data be described and categorized so that they can be understood and communicated?'' [16].
4.2.
Coupling different approaches
For future research it would be valuable to investigate, whether other formalization paradigms are usable to describe data and the related quality information. Kuhn for instance, proposes the algebraic description and interpretation of semantics and semantic networks, the latter being a wide spread approach to represent application speci®c knowledge [27], [28]. He mentions that the combination of semantic networks with algebra provides modeling power that is complete with respect to ontological requirements as well as with respect to implementability [28]. A further step towards a formalization of comprehensive quality information should possibly include such algebraic representations. This would be a consequent step having in mind the similarity of the graph based representations in ®gures 1, 3 and 4 and semantic networks, for instance. Object-oriented programs that are based on the class paradigm as RDF, however, could include such approaches. They would enable the software developer to include quality controls as well as description of quality constraints directly within the classes [4]. Further research should concentrate on such issues. Another issue is the extension of RDF Schema. For example, some authors claim that RDF Schema provides means to de®ne vocabulary, structure and constraints for expressing metadata about Web resources, but that there are de®cits concerning the de®nition of formal semantics for the primitives de®ned in RDF Schema [10]. They suggest the addition of the ontology language OIL as an additional extension of RDF
FIELD DATA COLLECTION WITH MOBILE GIS
377
Schema, which might make the use of RDF Schema an improved opportunity to describe spatial objects comprehensively.
5.
Conclusions
At the core of the problems taken up here is the inability of applications to ``understand'' content. Learning to understand information about content is a major step toward developing a solution [44]. The combination of approaches to handle GIC speci®c semantics and data quality (Section 3), to describe the data based on ontologies and possibly link them to formal approaches (Section 4) is meant as an approach to overcome current shortcomings in data modeling and data communication. The development of special tools for mobile geocomputing requires comprehensive knowledge of the meaning of the objects which have to be collected outdoors. The differences, how information communities de®ne their universe of discourse are often substantial, especially if different GICs use similar or equal terms but have different semantics in mind. Obviously this causes problems for data sharing between such information communities but also for the interoperability of GI services like those described in Section 3. As long as GICs work on the basis of speci®c catalogs of spatial objects that are relevant within their speci®c universe of discourse these problems will not be solved satisfactorily. The basic dif®culties of data transfers within and between GICs will remain. Once one recognizes spatial data transfers as communication processes, it becomes evident that they require ®rst and foremost a shared conceptual basis among the participants. Then questions of common formats become secondary [27]. Such communication processes must keep what they promise: They must communicate the semantics of data as well as the quality requirements and constraints speci®c to the collectors of the data. A conceptual basis requires the assessment of the semantic grounds of different GICs that show common areas to allow developing tools which translate between the GICs. A vehicle to ®nd such common areas is a formal ontology [6]. The ontology based approach presented here is considered to be a step toward semantically enriched data models that are communicable using existing and working formalisms. For reasons of usability the parameters used can be included in the description of data resources with RDF and RDF/Schema. Such descriptions could possibly combined with approaches to visualize data quality [45]. The approach suggested is pragmatic because it is based on existing technologies and methods to formalize knowledge. But in such a way, it contributes to the aim of making quality information accessible in the course or context of an application [3]. Ontologybased data modeling, and the formalization of quality constraints using a language such as RDF and RDF Schema are a step forward on the thorny pathway that lead to usable data quality descriptions that should be *
Accessible, assessable, and sharable between different GICs via the Internet and
378
*
PUNDT
Take into account comprehensive quality descriptions that consider explicitly the dependencies between data quality and semantics.
References 1. ANZLIC. Core Metadata Elements for Land and Geographic Directories in Australia and New Zealand. http://www.anzlic.org.au/metaelem.htm, 1997. 2. R.B. Bailey. Ecosystem Geography. Springer, New York, Berlin, 1996. 3. K. Beard. ``Representations of Data Quality,'' in M. Craglia and H. Couclelis (Eds.,), Geographic Information ResearchÐBridging the Atlantic, Taylor & Francis, London, Philadelphia, 280±294, 1997. 4. L. Becker, L. Bernard, J. DoÈllner, S. Hammelbeck, K.H. Hinrichs, T. KruÈger, B. Schmidt, and U. Streit. ``Integration dynamischer AtmosphaÈrenmodelle mit einem (3 1)-dimensionalen objektorientierten GISKern (Integration of dynamic atmospheric models with a (3 1)-dimensional object oriented GIS Kernel),'' Umweltinformatik 99, Metropolis Verlag, Marburg, Germany, 429±442, 1999. 5. S. BioÈrklund. ``GPS in powerful combination with geographical databases and digital maps,'' Proc. of the 18th Int. Cartographic Conference, Stockholm, Vol. 4:2117±2124, 1997. 6. Y. Bishr, H. Pundt, and Chr. RuÈther. ``Proceeding on the road of semantic interoperabilityÐDesign of a semantic mapper based on a case study from transportation,'' Interoperating Geographic Information Systems, Lecture Notes in Computer Science 1580, 203±215, 1999. 7. Y. Bishr and W. Kuhn. The Role of Ontologies in Modelling Geospatial Features. IfGI prints 5, Institute for Geoinformatics, University of MuÈnster, Wissenschaftsverlag, Solingen, 1999. 8. M. Breunig. Integration of Spatial Information for Geo-Information Systems. Springer, Lecture Notes in Earth Sciences 61, Heidelberg, New York, 1996. 9. D. Brickley. RDF Sitemaps and Dublin Core Site Summaries. http://rudolf.opensource.ac.uk/ about/specs/ sitemap.html, 2000. 10. J. Broekstra, M. Klein, S. Decker, D. Fensel, F. van Harmelen, and I. Horrocks. ``Enabling knowledge representation on the Web by extending RDF Schema,'' in Proceedings of the tenth World Wide Web conference WWW'10, 467±478, 2001. 11. CEN. Geographic InformationÐData DescriptionÐQuality. Draft European Standard, PrEN 12656, European Committee for Standardization, Brussels, 1996. 12. E.F. Codd. ``Data models in database management,'' Proceedings Workshop on Data Abstraction and Conceptual Modeling. Cited after: D. Peuquet. ``A Conceptual framework and comparison of spatial data models,'' Introductory Readings in Geographic Information Systems, Taylor & Francis, London, New York, 230±245, 1991. 13. M. Duckham and J. Drummond. ``Implementing an object-oriented approach to data quality,'' in B. Gettings (Ed.), Integrating Information Infrastructures with GI Technology, Taylor & Francis, London, Philadelphia, 53±64, 1999. 14. S. Faiz, J.P. Nzali, and P. Boursier. ``Representing the quality of geographic information depending on the User Context,'' Proc. of the Joint Europ. Conf. on Geogr. Information, Barcelona, Vol. 1:73±76, 1996. 15. A. Frank. ``Metamodels for data quality description,'' in M.F. Goodchild and R. Jeansoulin (Eds.), Data Quality in Geographic Information: From Error to Uncertainty, Edition Hermes, Paris, 15±30, 1998. 16. M. Gahegan. ``Characterizing the semantic content of geographic data, models, and systems,'' in M. Goodchild, M. Egenhofer, R. Fegeas, and C. Kottmann (Eds.), Interoperating Geographic Information Systems, Kluwer Academics, 71±83, 1999. 17. K. Gardels. A Comprehensive Data Model for Distributed, Heterogeneous Geographic Information. http:// www.regis.berkeley.edu/gardels/geomodel_def.html, 1996. 18. M.F. Goodchild, and S. Gopal. Accuracy of spatial databases. Taylor & Francis, London, 1989. 19. M.F. Goodchild. ``Attribute accuracy,'' Elements of Spatial Data Quality, S.C. Guptill and J.L. Morrison (Eds.), Elsevier Science Ltd., BPC Wheatons Ltd, Exeter, UK, 59±79, 1995.
FIELD DATA COLLECTION WITH MOBILE GIS
379
20. M.F. Goodchild. ``The spatial data infrastructure of environmental modeling,'' GIS and Environmental Modeling: Progress and Research Issues, GIS World Books, Fort Collins, USA, 11±15, 1996. 21. F. Harvey. ``Quality is Contextual,'' in M.F. Goodchild and R. Jeansoulin (Eds.), Data Quality in Geographic Information: From Error to Uncertainty, Edition Hermes, Paris, 37±42, 1998. 22. F. Harvey, W. Kuhn, W.H. Pundt, Y. Bishr, and C. Riedemann. ``Semantic interoperability: a central issue for sharing geographic information,'' The Annals of Regional Science, Vol. 33(2), 1999, Springer-Verlag, Berlin, Heidelberg, 213±232, 1999. 23. C. Heisig. Mobile Erfassung raumbezogener Daten mit Pen-Computern und GPS (Mobile Acquisition of Spatial Data with Pen-Computers and GPS). http://www.conterra.de/service/presse/CH_HP983.htm, 1998. 24. ISO, http://www.statkart.no/isotc211/protdoc/211n919/211n919.doc, 2000. 25. P.C. Jackson. Introduction to Arti®cial Intelligence. Dover Science Books, Dover Publications, Inc. New York. 1985. 26. R. Jasper and M. Uschold. A Framework for Understanding and Classifying Application Ontologies. http:// sern.ucalgary.ca/KSI/KAW/KAW99/papers/Uschold2/®nal-ont-apn-fmk.pdf, 1999. 27. W. Kuhn. ``De®ning semantics for spatial data transfers,'' Proc. of the 6th Int. symp. on Spatial Data Handling, Edinburgh, Vol. 2:973±987, 1994. 28. W. Kuhn. An Algebraic Interpretation of Semantic Networks. Spatial Information Theory. Proc. of the Int. Conference COSIT '99, Lecture Notes in Computer Science, Springer, 331±347, 1999. 29. S.Y. Lam and Y.Q. Chen. ``Ground-based positioning techniques,'' in Y.Q. Chen and Y.C. Lee 2001, Geographical Data Acquisition, Springer, Wien, New York, 85±97, 2001. 30. D.M. Mark. Toward a Theoretical Framework for Geographic Entity Types. Spatial Information Theory, Lecture Notes in Computer Science 716, Springer, 270±283, 1993. 31. D. Martin. Geographic Information Systems, 2nd edition, Routledge, London, New York. 32. T.H. Merret, T.J. Otoo, A. Thiyagarajah, A. Valdivia-Martinez, and X.Y. Zhao. ``Interoperation of heterogeneous GIS via database communication: a prototype,'' Proc. 8th Int. Symp. on Spatial Data Handling, University Vancouver, Canada, 277±286, 1998. 33. D.R. Miller. Knowledge-Based Systems for Coupling GIS and Process-Based Ecological Models. GIS and Environmental Modeling: Progress and Research Issues. GIS World Books, Fort Collins, USA, 231±234, 1996. 34. H. Moellering. ``Metadata: an essential component of the spatial data environment,'' Proc. of the 18th International Cartographic Conference, Stockholm, Vol. 4:2076±2083, 1997. 35. J.M. Morrison. Spatial Data Quality. Elements of spatial data quality. Elsevier Science Ltd., BPC Wheatons Ltd., Exeter, UK, 1±12, 1995. 36. OGC. Open GIS Consortium. http://www.opengis.org/, 2000. È stmann. ``Quality systems for spatial data,'' Proc. of Joint European Conference and Exhibition on 37. A. O Geographical Information, Barcelona, March 27±29, 1996. Proceedings Vol. 1, IOS press Amsterdam, Tokyo, 268±276, 1996. È stmann. ``The speci®cation and evaluation of spatial data quality,'' Proc. of the 18th International 38. A. O Cartographic Conference, Stockholm, Vol. 4:836±847, 1997. 39. H. Pundt, A. Hitchcock, M. Bluhm, and U. Streit. ``A GIS supported freshwater information system including a Pen-Computer component for digital ®eld data recording,'' Proceedings of the International Conference on GIS for Hydrology and Water Resources Management, IAHS Publication No. 235, 703±711, 1996. 40. H. Pundt. Wissensbasierte Komponenten zur Verbesserung der DatenqualitaÈt bei digitalen Feldkartierungen (Knowledge-based components for the improvement of data quality during data acquisition). Angew. Geographische Informationsverarbeitung IX. Salzburger Geographische Materialien, Heft 26, University of Salzburg, Austria, 105±114, 1997. 41. H. Pundt and Y. Bishr. Ontologies to support Environmental Diagnoses and Semantic Mapping. Forschungsberichte Informatik/Mathematik der UniversitaÈt Bremen, Vol. 5(5):51±67, 1999. 42. C. Riedemann and W. Kuhn. What Are Sports Grounds? or: Why Semantics Requires Interoperability. Interoperating Geographic Information Systems. Lecture Notes in Computer Science 1580, Springer, 217± 229, 1999.
380
PUNDT
43. F. SalgeÂ. ``Semantic Accuracy,'' in S.C. Guptill and J.L. Morrison (Eds.): Elements of spatial data quality, Elsevier Science Ltd., BPC, Wheatons Ltd, Exeter, UK, 139±151, 1995. 44. L. Shklar. Java, RDF and the ``Virtual Web''. http://www.gamelan.com/journal/techfocus/ 090199_rdf1.html, 2000. 45. F.J. van der Wel, R.M. Hootsmans, and F. Ormeling. ``Visualization of data quality,'' Visualisation in modern cartography, Pergamon/Elsevier Science Oxford, New York, 313±331, 1994. 46. G. Wiederhold. ``Mediators to deal with heterogeneous data,'' in A. Vckovski, K.E. Brassel, and H.J. Schek (Eds.): Interoperating Geographic Information Systems. Lecture Notes in Computer Science, Springer Verlag, Berlin, New York, 1±16, 1999. 47. M. Wilde. Konzeption und prototypische Entwicklung einer SICAD-GIS-Komponente ``GewaÈsseroÈkologie'' fuÈr das Umweltinformationssystem der Stadt GuÈtersloh (Conception and prototypical Implementation of a SICAD-based GIS Component ``River Ecology'' for the Environmental Information System of the City of GuÈtersloh). Diploma Thesis at the Institute for Geoinformatics, University of Muenster. 85 (unpublished), 2001. 48. W3C, Resource Description Framework (RDF) Schema Speci®cation 1.0. http://www.w3.org/TR/WD-rdfschema, 2000.
Hardy Pundt received his Ph.D. at the University of MuÈnster in 1995 with a thesis about the integration of knowledge-based systems and ®eld based GIS. From 1996±2002 he worked as an assistant at the Institute for Geoinformatics (IfGI) in MuÈnster. Since April 2002 he is Professor for Geoinformation at the University of Applied Sciences Harz in Wernigerode, Germany. His research interests focus on mobile GIS, semantics of geoinformation, and ontologies. He is chairman of the working group on environmental modeling (EMOD) of the Association of Geographic Information Laboratories in Europe (AGILE).