Document not found! Please try again

Towards Topological Consistency and Similarity of ... - CiteSeerX

0 downloads 0 Views 2MB Size Report
Nov 4, 2005 - ... often called maps, are sets of geographical features spatially described as geomet- ... focus on a specific abstract model for multiresolution maps and to test whether the ...... geodatabasetopology.pdf. [2] H. T. Burns and M. J. ...
Towards Topological Consistency and Similarity of Multiresolution Geographical Maps Alberto Belussi

Barbara Catania, Paola Podesta`

Dipartimento di Informatica University of Verona, Italy

Dipartimento di Informatica e Scienze dell’Informazione University of Genoa, Italy

[email protected]

{catania,podesta}@disi.unige.it

ABSTRACT

1.

Several application contexts require the ability to use together and compare different geographic datasets (maps) concerning the same or overlapping areas. This is for example the case of mediator systems, integrating distinct data sources for query processing, and GISs dealing with multiresolution maps. In both cases, distinct maps may represent the same geographic feature with different geometry type (a road can be a region in one map and a line in another one). An important issue is therefore determining whether two multi-resolution maps are consistent, i.e., they represent the same area without contradictions, and, if not, if they are at least similar. In this paper we consider consistency and similarity of multi-resolution maps with respect to topological information. Existing approaches do not take feature geometry type into account. In this paper, we extend them with two notions of topological consistency, the first requiring the same topological relation between pairs of common features, the second ‘relaxing’ the first one by considering similarity between topological relations. A similarity function for multi-resolution maps is then provided, taking into account both feature geometry types and topological relations of map objects. We finally discuss how the proposed consistency and similarity concepts can be significantly used in GIS applications. Some experimental results are also reported to show the effectiveness of the proposed approach.

In recent times, geographic data are more and more used in GISs to support critical decision processes. They often correspond to datasets, collected and integrated from different sources (private or public institutions) and can be produced by different types of processes (e.g., social, ecological, economical) over a geographic area, at different times (e.g., every 10 years). Such datasets, often called maps, are sets of geographical features spatially described as geometric objects embedded in a reference space and/or topological relationships existing among them. Topological information can be derived from geometric one but it can also be related to features that are not geometrically represented. Depending on the application context, geographic data can appear in different forms. This may lead to multiple representations of the same geographic features in distinct maps, possibly with different dimensions. For example, a road can be represented as a region for an ecological process whereas it can be represented as a line for a traffic analysis process. This approach corresponds to a sort of multiresolution that must be effectively managed in order to deal with all these multiple information in the same environment. Since data integration is becoming more and more common in many contexts, several GIS applications have to deal with different geographic datasets that can be considered multiresolution maps. Among the others, in this paper we investigate the following application contexts:

Categories and Subject Descriptors H.2 [DATABASE MANAGEMENT]: Database Applications—Spatial Databases and GIS

General Terms Experimentation, Measurement, Theory

Keywords GIS, Topology, Consistency, Similarity, Multiresolution

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. GIS’05, November 4, 2005, Bremen, Germany. Copyright 2005 ACM 1-59593-146-5/05/0011 ...$5.00.

INTRODUCTION

• Query processors of multiresolution GISs, where queries can be executed over different datasets since different resolution levels are available. According to the query request, it could be convenient to execute the query on the more generalized available maps, like in the case of a query coming from a remote application on the Web, or it could be necessary the usage of some more specialized maps since the request comes from a local application and the user is interested in detailed geometric descriptions. • Tools for consistency analysis of datasets obtained from different sources: they are required to evaluate the possibility of comparing or using together multiresolution maps in processing activities. • Tools for detecting and analyzing changes in time that occur in a given area or across overlapping areas: these tools are necessary in order to identify the amount of changes among maps before starting a more accurate comparison of their content.

Another interesting application of multiresolution map is the case of mediator systems, that have to integrate distinct and possibly multiresolution data sources in order to process global query requests or update operations over different GISs. In this paper we do not consider this particular issue, that requires an ‘ad hoc’ analysis and we refer the reader to [9] for additional details. In all the contexts presented above, in order to be able to consistently use multiresolution maps, the problem arises of evaluating data integrity, that is, establishing whether maps may lead to contradictory results if a user performs some kind of analysis over them. The need thus arises of defining ad hoc methods for checking consistency and similarity between multiresolution maps. Consistency and similarity can be defined by considering various map properties. Choosing a property corresponds to focus on a specific abstract model for multiresolution maps and to test whether the information they contain is contradictory or not with respect to that property. Even if both geometry and topology properties are available in a multiresolution map, it seems reasonable to discard geometry. Indeed, geometric consistency would be reduced to an equality test between two geometric map representations and similarity will require a sort of object extension measure in order to compare the geometric changes between two objects; in both cases, after a change of resolution (i.e., dimension), these properties cannot be preserved. Based on these considerations, topological map representations seem more suitable for checking consistency and similarity. Indeed, topology representations are more abstract than the geometric ones and describe properties that are preserved after object dimension changes or, at least, are transformed according to some predefined rules. Moreover, topological relationships describe fundamental spatial properties that are frequently used by final users to query and analyze geographic information. They have been therefore deeply analyzed in the literature [6, 7, 3] and have been adopted by the query language of many GISs (e.g., Oracle 10g [12], ArcGIS ESRI [1]) and supported by Java APIs such as JTS Topology Suite [10]. Unfortunately, previous work on topological map similarity has considered single resolution maps [5, 4, 2, 8]. Thus, topological similarity between relations that hold between pairs of objects with different dimensions has not been considered at all. We claim that investigating such issue is quite important in order to effectively and efficiently support all the application contexts cited above. The aim of this paper is therefore to: (i) extend the existing approaches for checking map consistency and similarity at the topological level; (ii) experimentally validate the proposed concepts through several tests over real and synthetic data; (iii) apply such notions in the scenarios identified above. More precisely, the contributions of this paper can be summarized as follows: • We provide an abstract model for geographical maps, based on topological properties. The considered topological representation adopts the well-known 9-intersection model of Egenhofer et al. [7]. Thus each generalized map can be topologically described by a set of 9-intersection matrices between pairs of map objects. Since this approach defines a huge number of relations, we choose to group together similar relations, obtaining a partition of them as done in [3]. This is the approach followed by many GISs (e.g., Oracle 10g, Ar-

cGIS ESRI, etc.) and Java APIs. The proposed model supports multiresolution maps representation. • We provide notions of consistency for multiresolution maps. Since topological relations can change with the geometric dimension of objects (0: point, 1:line, 2:region), we need to investigate how topological relations are altered by objects dimension changes. For example, given two adjacent map objects of dimension 2 (regions), we may want to know what becomes their topological relationship if they are generalized as lines in another map. Since the same relation may not always exist after the dimension change, a strong concept of consistency (called equality-based consistency), that requires the preservation of all relations, is not applicable in practice. Thus, we introduce a notion of distance-based consistency, that requires the transformation of each relation in the ‘nearest neighbor’ relations according to a topological distance between topological predicates applied over pairs of features, possibly having different dimensions. • We provide a notion of similarity between multiresolution maps, based on the proposed definition of distance among topological relationships involving objects of different dimensions. A ranking approach is also provided that returns pairs of objects in decreasing order with respect to their topological similarity. Such ranking could be quite useful in determining which pairs of objects mainly contribute to make two maps different. • We report results of some experiments conducted over synthetic and real datasets, in order to check the effectiveness of the consistency and similarity notions we have proposed. The results of such experiments will also be discussed to suggest how the proposed concepts can be applied in query processing and analysis of multiresolution maps. The paper is organized as follows. Section 2 discusses previous work on topological consistency and similarity. In Section 3, we briefly illustrate the data model adopted for representing multiresolution maps. In Section 4, we present the definition of equality-based and distance-based consistency together with a topological distance between topological relations, whereas a similarity function between multiresolution maps is presented in Section 5. Experimental results are then discussed in Section 6. Finally, Section 7 presents some conclusions and outlines future work.

2.

RELATED WORK

The most relevant papers concerning the representation of topological properties are [6, 7, 3]. In the former, known as 4-intersection model, spatial objects are modeled as pointsets describing their interior and their boundary. Only regions are considered and topological relations are defined as 2x2 matrices, containing the value empty or non empty for the intersections of the point-sets of the objects involved. The 4-intersection model has subsequently been extended by considering, besides interior and boundary, also object exterior [7]. Thus, each object is now represented using three point-sets: interior, boundary, and exterior. As result, topological relations are represented by 9 intersections (a matrix 3x3). This model, known as 9-intersection model,

defines a formal categorization of binary topological relations between regions, lines, and points. In [3], the 9-intersection model has been extended by considering the dimension of the intersection. Moreover, in the same paper the authors supply a way for categorizing the resulting relationships into five intuitive and mutually exclusive topological relations (Disjoint, T ouch, In, Cross, Overlap). Such relationships are now at the basis of query processors of well known spatial systems, such as Oracle 10g [12] and ArcGIS ESRI [1], and of spatial Java APIs, such as JTS Topology Suite [10]. Multiresolution has been mainly considered in the definition of ad hoc operators to generalize/specialize map objects. In [5], multirepresentation is interpreted as a set of different levels S0 ,...,Sn , where each level is a set of spatial objects which share some topological relations. Level S(i+1) is derived from level S(i) by using some generalization operations. The paper mainly deals with topological relations between regions. The proposed operations guarantee to transform a map into another map, which is consistent with the first one with respect to topological relations but object dimension is not considered. A similarity measure between maps, defined as a deviation from consistency, has also been provided. In [15] a formal model based on planar abstract cell complexes for representing multiresolution maps is proposed. Moreover, the authors present a set of generalization operations, that allow one to simplify a map preserving consistency. The proposed consistency test is defined at the combinatorial level by means of homeomorphisms. In [4], an important framework defining a method for checking similarity between topological relations of regions, defined according to the 9-intersection model, is presented. Similarity is computed by comparing two 9-configurations and computing the number of different values of each 9 intersection (topology distance). Using this function, a partial order over topological relations between regions can be defined and used to evaluate how two relations are far from each other. In [2], topology distance is used to evaluate similarity of spatial scenes, by taking into account also direction and distance relations. In [8], topology distance is used to define a model (snapshot model) to compare two different topological relations between lines and regions. In all the papers cited above, similarity is computed only between pairs of features with the same dimension. Multiple feature representations are not considered at all. Two different consistency issues are addressed in [13, 14]. In [13], consistency among networks, defined as sets of lines (homogeneous networks) and sets of lines and regions (heterogeneous networks), is investigated, by providing several sufficient conditions. In [14], the problem of checking consistency under aggregation operations, that merge together disconnected parts of the same region, is considered. The proposed approach relies on the 4-intersection model. Both approaches do not consider changes in object dimension and no similarity measure is provided.

3. THE REFERENCE MODEL We define a map schema as a set of feature types, object classes representing real word entities (such as lakes, rivers, etc.). Each feature type has some descriptive attributes, including a feature identifier and a spatial attribute, having a given dimension. We assume that values for the spatial attribute are modeled according to the OGC (Open GeoSpa-

tial Consortium) simple feature geometric model [11]. In such a model, the geometry of an object can be of type point (dimension 0, also denoted with P ), line (dimension 1, also denoted with L) or polygon - more generally called region - (dimension 2, also denoted by R), or recursively be a collection of disjoint geometries. A point describes a single location in the coordinate space; a line represents a linear interpolation of an ordered sequence of points; a polygon is defined as an ordered sequence of closed lines defining the exterior and interior boundaries (holes) of an area. Given a set of map schemas M S1 , ..., M Sn and a set of feature types f t1 , ..., f tm , we assume that a feature type may belong to one or more map schemas, possibly with different dimensions. For example, the feature type river may belong to M S1 with dimension 1 (i.e., rivers are represented as lines in M S1 ), and to M S2 with dimension 2 (i.e., rivers are represented as regions in M S2 ). Thus, each feature type is assigned multiple dimensions, one for each map schema in which it appears. The dimension of a feature type f t in a map schema M Si is denoted by dim(f t, M Si ). The instance of a map schema is called map and is a set of features, instances of the feature types belonging to the map schema. Given a set of map instances M1 , ..., Mn , instances of the map schemas M S1 , ..., M Sn , respectively, a feature fj over a feature type f ti may belong to one or more maps, associated with possibly different geometries and dimensions according to the map schemas. When this happens, we say that M1 , ..., Mn are multiresolution maps. For example, based on the previous example, the Danube river may appear in an instance of M S1 represented as a line and it may appear in an instance of M S2 represented as a region. The set of instances of a feature type f t in a map Mi is denoted by ext(f t, Mi ). We assume that an identifier is associated with each feature. Such identifier is used to determine whether a feature belongs to a map. Based on the dimension assigned to feature types in maps, we can define a generalization (specialization) relation between maps as follows. Definition 1 (Generalization). Let M S1 and M S2 be two map schemas and f t ∈ M S1 ∩ M S2 . M S1 generalizes M S2 (M S2 specializes M S1 ) with respect to f t, if dim(f t, M S1 ) < dim(f t, M S2 ). M S1 generalizes M S2 (M S2 specializes M S1 ) if the following conditions hold: • ∃f t ∈ M S1 ∩ M S2 , M S1 generalizes M S2 (M S2 specializes M S1 ) with respect to f t; • ∀f t ∈ M S1 ∩ M S2 , dim(f t, M S1 ) = dim(f t, M S2 ) or M S1 generalizes M S2 (M S2 specializes M S1 ) with respect to f t. 2 Features inside a map are related by topological relationships, described in the topological layer. Topological relationships can be formally defined by using the 9-intersection model [7]. In the 9-intersection model, each spatial object A is represented by 3 point-sets: its interior A◦ , its exterior A− and its boundary ∂A. The definition of binary topological relations between two spatial objects A and B is based on the 9 intersections of each object’s component. Thus, a topological relation can be represented as a 3x3-matrix, called 9-intersection matrix, defined as follows:  ◦  A ∩ B ◦ A◦ ∩ ∂B A◦ ∩ B − ◦ − R(A, B) =  ∂A ∩ B ∂A ∩ ∂B ∂A ∩ B  A− ∩ B ◦ A− ∩ ∂B A− ∩ B −

Considering the value empty (⊘) or not empty (¬⊘) for each 9-intersection, one can distinguish many relationships between regions, lines, and points embedded in R2 . These relations are mutually exclusive and represent a complete coverage. In [3], this model has been extended by considering for each 9 intersection its dimension, giving raise to the extended 9-intersection model. Since the number of such relationships is quite high, a partition of extended 9intersection matrices has been proposed, grouping together similar matrices and assigning a name to each group. The result is the definition of the following set of binary, mutually exclusive topological relationships, refining those presented in [3]: REL = {Disjoint, T ouch, In, Contains, Equal, Cross, Overlap, Covers, CoveredBy}.1 Such topological relations, except Covers and CoveredBy, are those supported by well known Java APIs such as JTS Topology Suite [10]. Other partitions of 9-intersection matrices is however possible. See for example those supported by Oracle 10g [12]. The semantics of the topological predicates of REL is provided in Table 1. As we can see from Table 1, not all relationships can be defined for any dimensions. Therefore, given two dimensions d1 , d2 ∈ {R, L, P }, we denote with REL(d1 , d2 ) the set of topological relationships that can be defined for d1 and d2 . Moreover, we denote with I9 (θ, d1 , d2 ) the set of 9-intersection matrices defining predicate θ ∈ REL between two objects of dimension d1 and d2 . The table, besides a logical formalization of the predicates, also provides the pattern representing the structure of I9 (θ, d1 , d2 ). The pattern is a string P1,1 P1,2 P1,3 P2,1 P2,2 P2,3 P3,1 P3,2 P3,3 where element Pi,j corresponds to cell (i, j) in the 9-intersection matrix. The symbol − means that the corresponding position is not relevant in defining the topological relation, F and T mean that the intersection is or is not empty (whatever is the dimension), x ∈ {0, 1, 2} means that the intersection is not empty and has dimension x. We can see that the dimension of the intersection is needed only to discriminate between Cross and Overlap between L/L. In all the other cases, dimension is not required. In the following, we denote with f1 and f2 two features and with M1 , M2 two maps such that: (i) f1 θ1 f2 holds in M1 , θ1 ∈ REL(d1 , d2 ); (ii) f1 θ2 f2 holds in M2 , θ2 ∈ REL(d3 , d4 ).

4. TOPOLOGICAL CONSISTENCY When two maps have a set of features in common, possibly represented with different dimensions, the problem arises of establishing whether such maps are consistent, i.e., they represent common objects in a consistent way. In this paper we consider topological consistency, i.e., consistency with respect to topological relationships existing between map objects. Informally, two maps are topologically consistent when, given any pair of common features, they share the same or similar topological relationship in both maps. Different types of consistency can be defined: equality-based (eqbased) consistency and distance-based (dist-based) consistency. Eq-based consistency requires that the same topolog1 Covers and CoveredBy are here defined as refinements of relations Contains and In and are not considered in [3].

o1

o1

o2

M1 (2,2,overlap)

o2

M2 (2,2,overlap)

o1

o1

o1

o1

M3 (2,2,touch)

M4 (2,1,cross)

o1 o2

o2

o2

o2

o2

o1 o2

M5 (1,1,overlap)

M6 (1,1,cross)

M7 (2,2,disjoint)

M8 (1,1,disjoint)

Figure 1: Topological relationships for features o1 and o2 in maps M1 , ..., M8 . For each map, object dimensions is also reported. ical relationship exists between each pair of common features in two maps (and therefore feature dimensions inside the two maps must allow the definition of such topological relationships). For example, considering the maps in Figure 1 and assuming they contain just o1 and o2 , we note that M1 is eq-based consistent only with M2 and M5 . Definition 2 (Eq-based consistency). f1 and f2 in map M1 are eq-based consistent with f1 and f2 in map M2 if f1 θf2 holds in both M1 and M2 . M1 and M2 are eq-based consistent if, for any pair of features (f1 , f2 ) ∈ (M1 ∩ M2 )2 , f1 and f2 in map M1 are eq-based consistent with f1 and f2 in map M2 . 2 It is simple to note that eq-based consistency between pairs of features cannot always be guaranteed because not all topological relationships are defined for all possible pairs of dimensions (see Table 1). For example, Overlap is defined between pairs of regions and pairs of lines, but it is not defined between a line and a region. In this case, Cross can be used. From Figure 1 and from Table 1, we also note that Overlap and Cross are quite similar. From this consideration it follows that topological relationship equality is a too strong criterion for defining consistency. Such criteria can however be relaxed by considering a distance between relations, i.e., similarity. The new notion of consistency, that we call dist-based consistency, is always defined and requires that topological relationships between two features in two different maps are not necessarily equal but the most similar ones. Dist-based consistency seems the most reasonable choice in real situations dealing with multiresolution, where eq-based consistency cannot always be guaranteed. In order to formally define dist-based consistency, we need to define a distance function between topological relationships. Since each topological relationship in REL corresponds to a set of 9-intersection matrices, we first need to define a distance function between two 9-intersection matrices, then using such function in computing the final result.

Name Disjoint (d)

Definition f1 ∩ f2 = ∅

Touch (t)

(f1◦ ∩ f2◦ = ∅) ∧ (f1 ∩ f2 ) 6= ∅)

In (i)

(f1 ∩ f2 = f1 ) ∧ (f1◦ ∩ f2◦ ) 6= ∅)

Contains (c) Equal (e) Cross (r) Overlap (o)

(f1 ∩ f2 = f2 ) ∧

(f1◦



f2◦ )

6= ∅)

f1 = f2 dim(f1◦ ∩ f2◦ ) = (max(dim(f1◦ ), dim(f2◦ )) − 1) ∧ (f1 ∩ f2 ) 6= f1 ∧ (f1 ∩ f2 ) 6= f2 dim(f1◦ ) = dim(f2◦ ) = dim(f1◦ ∩ f2◦ ) ∧ (f1 ∩ f2 ) 6= f1 ∧ (f1 ∩ f2 ) 6= f2

Covers (v)

(f2 ∩f1 = f2 )∧(f2◦ ∩f1◦ ) 6= ∅)∧(f1 −f1◦ )∩(f2 −f2◦ ) 6= ∅

CoveredBy (vb)

(f1 ∩f2 = f1 )∧(f1◦ ∩f2◦ ) 6= ∅)∧(f1 −f1◦ )∩(f2 −f2◦ ) 6= ∅

Object type All R/R, R/L, R/P, L/L, L/P

Pattern FFTFFTTTT

R/R, L/L, L/R, P/R, P/L R/R, R/L, R/P, L/L, L/P R/R, L/L, P/P L/R L/L R/R L/L R/R, R/L, R/P, L/L, L/P R/R, L/L, L/R, P/R, P/L

T-F-FFTTT

F T - - - - - - T or F - - T - - - - T or F---T---T

T-T-FTFFT TFFFTFFFT T-T---T-T 0-T---T-T T-T---T-T 1-T---T-T T-T-TTFFT T-F-TFTTT

Table 1: Definition of the reference set of topological relationships In defining the distance between two 9-intersection matrices ψ1 and ψ2 (denoted by d9 (ψ1 , ψ2 )), we adopt the approach proposed in [4] and we define it as the fraction between the number of different cells in the two matrices and the total number of cells (9). Two cells are considered different if one corresponds to a non-empty intersection (whatever is its dimension) and the other to an empty intersection.2 Note that the dimension of the intersection is not taken into account when computing distances, since dimension is considered only in the definition of two topological predicates (as we can see from Table 1) and thus it is not an important property for the considered set of topological relations. Based on this distance, given two relationships ψ1 and ψ2 in REL, their distance can now be computed as the minimum distance between any 9-intersection matrix defining ψ1 and any 9-intersection matrix defining ψ2 . We have chosen the minimum (and not the average or other functions) since we are interested in maximizing similarity among topological relations after dimension changes. Definition 3 (Distance). Let θ1 ∈ REL(d1 , d2 ) and θ2 ∈ REL(d3 , d4 ). The distance between θ1 and θ2 , denoted by d(θ1 , (d1 , d2 ), θ2 , (d3 , d4 )), is defined as follows: d(θ1 , (d1 , d2 ), θ2 , (d3 , d4 )) = min{d9 (ψ1 , ψ2 )|ψ1 ∈ I9 (θ1 , d1 , d2 ), ψ2 ∈ I9 (θ2 , d3 , d4 )}.2 Example 1. Suppose you want to compute d(Contains, (R, L), In, (L, R)). Based on Table 1, it is possible to show that Contains over (R, L) corresponds to the following two 9-intersection matrices:     ¬∅ ¬∅ ¬∅ ¬∅ ¬∅ ¬∅ ∅ ¬∅ Contains2 = ¬∅ ∅ ¬∅ Contains1 =  ∅ ∅ ∅ ¬∅ ∅ ∅ ¬∅ 2

Since the boundary of points is empty, we do not consider the row and the column of the matrix corresponding to the boundary when one of the two objects is a point.

On the other hand, In over (L, R) corresponds to the following two 9-intersection matrices:     ¬∅ ¬∅ ∅ ¬∅ ∅ ∅ ∅ ∅  In2 = ¬∅ ∅ In1 = ¬∅ ∅ ¬∅ ¬∅ ¬∅ ¬∅ ¬∅ ¬∅ In order to compute their distance, we first need to compute all the distances between a matrix for Contains and a matrix for In, i.e., d9 (Containsi , Inj ), i, j = 1, 2, and then take the minimum value. For example, d9 (Contains1 , In1 ) = 6/9, since 6 positions out of 9 are different. Similarly, d9 (Contains1 , In2 ) = 5/9, d9 (Contains2 , In1 ) = 5/9, d9 (Contains2 , In2 ) = 4/9. Thus, d(Contains, (R, L), In, (L, R)) = 4/9. 2 Dist-based consistency can now be defined by requiring that topological relationships between pairs of features in two distinct maps must be the most similar ones, according to function d introduced above. Definition 4 (Dist-based consistency). f1 and f2 in map M1 are dist-based consistent with f1 and f2 in map M2 , if f1 θ1 f2 holds in M1 , f1 θ2 f2 holds in M2 and d(θ1 , (d1 , d2 ), θ2 , (d3 , d4 )) coincides either with min{d(θ1 , (d1 , d2 ), θ3 , (d3 , d4 ))|θ3 ∈ REL(d3 , d4 )} or min{d(θ2 , (d3 , d4 ), θ3 , (d1 , d2 ))|θ3 ∈ REL(d1 , d2 )}. M1 and M2 are dist-based consistent if ∀(f1 , f2 ) ∈ (M1 ∩ M2 )2 , f1 and f2 in map M1 are dist-based consistent with f1 and f2 in map M2 . We denote by dc((d1 , d2 ), θ1 , (d3 , d4 )) the set of all relations in REL(d3 , d4 ), which are dist-based consistent with θ1 ∈ REL(d1 , d2 ). 2 Based on the previous definition, we can state that, considering again Figure 1, M1 is dist-based consistent with M2 , M5 and M6 . Notice that, M6 is not eq-based consistent with M1 , while M2 and M5 are. Tables 2 and 3 present all possible d(θ1 , (d1 , d2 ), θ2 , (d3 , d4 )) (multiplied by 9, for the sake of readability), highlighting with a small square the minimum distance, which identifies for each relation its ‘nearest relations’, i.e. the dist-based consistent ones. The following proposition can be easily proved based on Definition 4.

(R,R)

(R,L)

(R,P) (L,P)

(L,R)

d t i vb e c v o d t c v r d t c d t i vb r

d

t

i

0 1 4 5 6 4 5 4

1 0 5 4 5 5 4 3

4 5 0 1 4 6 7 4

(R,R) vb e 5 4 1 0 3 7 6 3

6 5 4 3 0 4 3 6

c

v

o

d

t

(R,L) c

v

r

4 5 6 7 4 0 1 4

5 4 7 6 3 1 0 3

4 3 4 3 6 4 3 0

0 1 4 5 6 4 5 4 0 1 4 4 2

1 0 3 3 4 4 3 2 1 0 3 1 1

4 5 5 6 4 0 1 3 4 3 0 1 1

4 3 5 4 2 1 0 2 4 1 1 0 1

2 2 2 2 4 2 2 1 2 1 1 1 0

(R,P) (L,P) d t c 0 0 4 4 4 2 2 2 0 0 2 2 2 0 2 2

2 2 4 4 4 2 2 2 2 0 1 1 2 2 0 2

2 2 4 4 2 0 0 2 2 2 0 0 2 2 2 0

d

t

0 1 4 5 6 4 5 4 0 1 4 4 2 0 2 2 0 1 4 4 2

1 0 4 3 4 3 3 2 1 0 3 3 2 0 2 2 1 0 3 1 1

(L,R) i vb 4 5 0 1 4 5 6 3 4 3 4 5 2 4 4 4 4 3 0 1 1

4 3 1 0 2 5 4 2 4 3 5 4 2 3 4 3 4 1 1 0 1

r 2 2 2 2 4 2 2 1 2 2 2 2 0 1 2 1 2 1 1 1 0

Legenda: d = Disjoint t = T ouch i = In vb = CoveredBy e = Equal c = Contains v = Covers o = Overlap r = Cross

Table 2: Distance values (times 9) for topological relationships Proposition 1. Dist-based consistency satisfies the following properties: 1. it is a many-to-many relationship, i.e., cardinality of dc((d1 , d2 ), θ1 , (d3 , d4 )) may be greater than one; 2. it is simmetric, i.e., θ2 ∈ dc((d1 , d2 ), θ1 , (d3 , d4 )) if and only if θ1 ∈ dc((d3 , d4 ), θ2 , (d1 , d2 ));

5.

TOPOLOGICAL MAP DISTANCE

The distance function for dist-based consistency introduced in Section 4 can be used to define a distance function between maps. To this aim, we consider three distinct aspects, leading to the definition of three distinct distances:

3. it is reflexive, i.e., dc((d1 , d2 ), θ, (d1 , d2 )) = {θ} and d((d1 , d2 ), θ, (d1 , d2 ), θ) = 0;

• the dimension of feature types that are common to both maps (dimensional distance), to quantify how much geometric representation has changed (without accessing the real geometry);

4. eq-based consistency implies dist-based consistency, i.e., θ1 ∈ dc((d1 , d2 ), θ1 , (d3 , d4 )) except the cases where the dimensions of two objects in T ouch change from (R, P ) or (L, P ) to (P, L) or (P, R).

• the distance between the topological relationships of pairs of features belonging to both maps (topological distance), to quantify how much topological representation has changed;

Concerning item (1), from Table 3, we can see for example that dc((R, R), Overlap, (L, L)) = {Overlap, Cross}. Concerning item (4), it is important to note that the only cases in which eq-based consistency does not implies dist-based consistency do not represent generalization or specialization (see Table 3). This strange behavior is probably due to boundary information, quite relevant for the T ouch relationship, that are lost when transforming a region into a point. As a final consideration, we remark that, when determining map consistency, it may be useful to determine which pairs of features more contribute to inconsistent situations. A sort of ranking among pairs of common features can therefore be useful. Since dc is a many-to-many relation, it seems reasonable to set as ranking value the average difference between the distance of two corresponding topological relationships in M1 and M2 and the distances with their dist-based consistent relationships.

• the set of features appearing in just one map (content distance), to quantify the portion of space that is not in common.

Definition 5 (Consistency ranking value). The consistency ranking value for (f1 , f2 ) with respect to M1 , M2 , denoted by r((f1 , f2 ), M1 , M2 ), is defined as P c ∈DC |d(θ1 , (d1 , d2 ), θ2 , (d3 , d4 )) − c| |DC| where DC = {d(θ1 , (d1 , d2 ), θ3 , (d3 , d4 ))|θ3 ∈ dc((d1 , d2 ), θ1 , (d3 , d4 ))}∪ {d(θ4 , (d1 , d2 ), θ2 , (d3 , d4 ))|θ4 ∈ dc((d3 , d4 ), θ2 , (d1 , d2 ))}. 2

For each parameter, we first define one sub-function, then we combine together all sub-functions into a single parametric definition. All functions return values between 0 and 1. The dimensional distance quantifies the difference in dimension of features appearing in two maps. We compute it by considering the number of generalization/specialization steps required to transform a map into the other (assuming that each step applies to a single feature and changes its dimension by one) with respect to the maximum number of possible generalization/specialization steps. Such value is given by the number of common features multiplied by 2, since at most 2 generalization/specialization steps can be applied to each feature. Definition 6 (Dimensional distance). Let M S1 , M S2 be two map schemas. Let f t ∈ M S1 ∩ M S2 . Let dd(f t, M S1 , M S2 ) = |dim(f t, M S1 ) − dim(f t, M S2 )|. The dimensional distance of two maps M1 , M2 instances of M S1 and M S2 respectively, is defined as follows: Mdd (M1 , M2 ) = X

f t∈M S1 ∩M S2

1 × 2|M1 ∩ M2 |

|ext(f t, M1 )∩ext(f t, M2 )|∗dd(f t, M S1 , M S2 ).2

(R,R)

(R,L)

(R,P) (L,P)

(L,R)

(L,L)

(P,R) (P,L)

(P,P)

d t i vb e c v o d t c v r d t c d t i vb r d t i vb e c v o r d t i d e

d

t

i

vb

(L,L) e

c

v

o

r

0 1 4 5 6 4 5 4 0 1 4 4 2 0 2 2 0 1 4 4 2 0 1 4 5 6 4 5 1 1

1 0 2 2 3 2 2 2 1 0 2 2 1 0 1 2 1 0 2 2 1 1 0 2 2 3 2 2 1 1

4 3 0 1 4 6 7 4 4 3 5 5 2 4 4 4 4 4 0 1 2 4 2 0 1 4 6 7 1 1

5 3 1 0 3 7 6 3 5 3 6 4 2 4 4 4 5 3 1 0 2 5 2 1 0 3 7 6 1 1

6 4 4 3 0 4 3 6 6 4 4 2 4 4 4 2 6 4 4 2 4 6 3 4 3 0 4 3 2 2

4 4 6 7 4 0 1 4 4 4 0 1 2 2 2 0 4 3 5 5 2 4 2 6 7 4 0 1 1 1

5 3 7 6 3 1 0 3 5 3 1 0 2 2 2 0 5 3 6 4 2 5 2 7 6 3 1 0 5 5

1 1 1 1 2 1 1 1 1 1 1 1 0 1 2 1 1 1 1 1 0 1 1 1 1 2 1 1 0 0

1 1 1 1 2 1 1 1 1 1 1 1 0 1 2 1 1 1 1 1 0 1 1 1 1 2 1 1 0 0

(P,L) (P,R) d t i

(P,P) d e

0 0 2 2 4 4 4 2 0 0 4 3 1 0 1 2 0 0 2 2 2 0 0 2 2 4 4 4 1 1 0 2 2

0 0 2 2 3 2 2 1 0 0 2 2 1 0 1 2 0 0 2 2 1 0 0 2 2 3 2 2 1 1 0 1 2 0 3

2 2 2 2 4 4 4 2 2 2 4 4 2 1 2 3 2 0 1 1 2 2 1 2 2 4 4 4 2 2 2 0 2

2 2 0 0 2 4 4 2 2 2 4 3 1 2 3 2 2 2 0 0 2 2 2 0 0 2 4 4 1 1 2 2 0

3 3 1 1 0 1 1 2 3 2 1 1 2 3 2 1 3 2 1 1 2 3 3 1 1 0 1 1 2 2 3 2 1 3 0

Legenda: d = Disjoint t = T ouch i = In vb = CoveredBy e = Equal c = Contains v = Covers o = Overlap r = Cross

Table 3: Distance values (times 9) for topological relationships The topological distance between a pair of features in two distinct maps can be defined by taking into account topological distance between their topological relationships. The topological distance between two features can then be used to compute the topological distance between two maps by computing the average topological distance of any pair of features belonging to both maps. Definition 7 (Topological distance). Let M1 , M2 be two maps with schema M S1 , M S2 , respectively. Let {f1 , f2 } ⊆ M1 ∩ M2 , fi of feature type f ti , i = 1, 2. Suppose that f1 θj f2 holds in Mj , j = 1, 2. The topological distance of (f1 , f2 ) in M1 and M2 is defined as td((f1 , f2 ), M1 , M2 ) = d(θ1 , (dim(f t1 , M S1 ), dim(f t2 , M S1 )), θ2 , (dim(f t1 , M S2 ), dim(f t1 , MS 2))). The topological distance of maps M1 and M2 is defined as P (f1 ,f2 )∈|M1 ∩M2 |2 td((f1 , f2 ), M1 , M2 ) 2 Mtd (M1 , M2 ) = |M1 ∩ M2 |2 The third function computes the distance of two maps as the fraction of features belonging to just one map with respect to the total number of features. Definition 8 (Content distance). Let M1 , M2 be two maps with schema M S1 , M S2 , respectively. The content distance of M1 and M2 is defined as Mcd (M1 , M2 ) =

|(M1 − M2 ) ∪ (M2 − M1 )| |M1 ∪ M2 |

2

The functions introduced above can be used to define a parametric map distance function as follows. Definition 9 (Map distance function). Let 0 ≤ α, β, γ ≤ 1 such that α+β +γ = 1. Map distance between maps

M1 and M2 is defined as Mmd (M1 , M2 ) = αMcd (M1 , M2 ) + βMtd (M1 , M2 ) + γMdd (M1 , M2 ) 2 When determining map similarity, it may be useful to determine which pairs of objects more contribute to the distance value. Ranking among pairs of features belonging to both maps can therefore be useful also in this case. Since the only sub-function dealings with distance between pairs of features is the topological distance, it seems reasonable to define the similarity ranking value for features f1 and f2 as the topological distance between the corresponding topological relationships. Definition 10 (Similarity ranking value). The similarity ranking value for (f1 , f2 ) with respect to M1 , M2 is defined as sr((f1 , f2 ), M1 , M2 ) = td((f1 , f2 ), M1 , M2 ). 2

6.

APPLICATIONS AND EXPERIMENTAL RESULTS

In order to evaluate the effectiveness of the proposed concepts, we present some experimental results concerning three different application contexts. 1) Multiresolution in query processing; 2) Analysis of datasets obtained from different data sources; 3) Analysis of temporal evolution of the map content. In performing the experiments, we used both real and synthetic geographic datasets (see Figure 2 for some snapshots). We consider: • A set of maps of Italy, containing: the main Italian roads (Iroads ), the Italian railways (Irailways ), and the Italian rivers (Irivers ) as lines, the Italian provinces (Iprov ) and the Italian lakes (Ilakes ) as regions. Starting from such maps, we synthetically derived the corR R R responding region maps Iroads , Irailways and Irivers , by applying a buffer function.

• A set of maps of a municipality of Lombardy (Italy),3 containing: the streets of the municipality represented as regions (M L1R roads ) and the streets of the same municipality represented as lines (M L1L roads ). • A set of maps of another municipality of Lombardy (Italy),4 showing the buildings and the streets both as regions and at two different time instants: in 1993 (LM 293 and LM 293 and in 2001 streets ) buildings 01 ). (LM 2buildings and LM 201 streets

6.1 Experimental results In order to analyze the properties of the proposed distance function, we consider three main maps, obtained by combining in different ways maps of Italy: • Map M1 contains the main Italians railways, roads, provinces, rivers, and lakes as regions, i.e., R R R M1 = Irailways ∪ Iroads ∪ Iprov ∪ Irivers ∪ Ilakes . • Map M2 contains the main Italian railways as lines, roads and provinces as regions, i.e., M2 = Irailways ∪ R Iroads ∪ Iprov . • Map M3 contains some Italian railways and roads as lines and all Italian provinces as regions, i.e., M3 = ′ ′ ′ ⊂ Irailways ∪ Iprov where Irailways ∪ Iroads Irailways ′ and Iroads ⊂ Iroads . From the previous schema definition, and according to Definition 1, it follows that map M1 specializes maps M2 and M3 and map M2 specializes map M3 . Table 4 reports the values we obtained for the three proposed distance functions. We can see that: • Content distance (CD). M2 contains a subset of objects with respect to M1 (only roads, railways, and province). On the other hand, M2 and M3 contain the same feature types but M3 contains only a subset of existing roads and railways (that is a subset of the objects in M2 ). Thus, according to the results reported in Table 4, M1 is more similar to M2 than to M3 . • Dimensional distance (DD). We note that in M2 railways are represented as lines whereas in M1 they are regions. Moreover, in M3 both railways and roads become lines. Therefore, the dimensional distance between M1 and M3 is higher than the dimensional distance between M1 and M2 . • Topological distance (TD). Topological distance values are very small, thus maps are quite similar each other (less than 1% of topological relationships have changed in all three cases). We note that the topological distance between M1 and M3 is higher than the distance between M1 and M2 and between M2 and M3 . This means that several topological relationships have been changed in M3 with respect to M1 . This is probably due to the transformation of Overlap relations between regions, in M1 and M2 , into Cross relations between lines, in M3 . 3

These datasets were provided by COGEME Spa (Rovato Bs, Italy) in the context of the research project SpadaGIS founded by MIUR. 4 These datasets were provided by the Province of Brescia (Italy) in the context of the research project SpadaWEB founded by MIUR.

Maps M1 , M3 M1 , M2 M2 , M3

CD 0.794 0.685 0.479

DD 0.460 0.126 0.283

TD 0.0017 7.922*10−4 3.763*10−4

Table 4: Distance values for synthetic maps

6.2

Application 1: using multiresolution in query processing

The query processor of a multi-resolution GIS has to identify an execution plan for a query. In a multiresolution context this means that not only the operations and their application order has to be decided but also the input datasets. Indeed, a given query can be executed either on a more generalized or on a more specialized map according to the user choices or according to the optimization strategies of the query processor. When a query has been specified by considering a spatial representation at a given dimension or the dimension of the spatial representation is not given at all, the switch to a more specialized or to a more generalized map requires a query rewriting of the topological relationships used in the selection conditions. Such reformulation is guided by the criteria of the dist-based consistency definition presented in the previous section. In particular, each topological relationship will be substituted by one of the nearest relationships that can be defined after the dimension reduction (or increase). This can be exploited in distributed architectures, the Web for instance, where fast access and low cost are critical issues. In a Web application, it could be reasonable that the user specifies the query over a sort of global map schema, containing all the feature types with the maximum dimension they appear in the database. By considering the maps introduced in Subsection 6.1, this means that the query is specified over M1 schema. Given the query, in the first step the GIS server will return the more general map, that is the map with less information, but which also has the minimum transmission cost. For example, if the query asks for all crossing road/province pairs, this query can be first executed over M3 (the more generalized map containing all required feature types). After this first interaction, if the user is interested in having more details about the received map, then the system, by using more specialized maps, can execute the query again by substituting the topological relations using the topological distance function. In the proposed example, at the second step the query can be executed over M2 and then over M1 . In both cases, predicate Cross has no more meaning since roads and provinces are now regions and Cross is not defined between pairs of regions. Instead of Cross, Overlap, which is dist-based consistent with Cross when considering pairs of regions, can however be used.

6.3

Application 2: datasets obtained from different sources

The evaluation of consistency among a set of maps with respect to the topological relationships among their spatial objects is an important issue today, since very frequently GIS applications of different nature have to integrate and compare spatial information coming from different sources. This may lead to maps with different resolution and, in order to use both datasets in a multiresolution GIS, the problem arises of determining consistency of the two datasets.

(a)

(b)

(c)

(d)

(e)

(f)

Figure 2: Geographical maps of the considered datasets. (a) Map Iprov of provinces of Italy; (b) Map R R R Iroads ∪Irailways ∪Irivers of roads, railways, and rivers of Italy, represented as lines; (c) Map Iroads ∪Irailways ∪Irivers of roads, railways, and rivers of Italy, represented as regions; (d) A portion of M L1roads datasets, where roads are represented as both regions and as lines; (e) Map LM 193 ; (f ) Map LM 101 . Based on the two consistency measures we proposed, it is possible to implement interesting tools for map comparison at topological level. In case of negative results of the consistency test, a ranking of the violation cases can be computed, according to the topological distance among the registered topological relationships in the two compared maps. The user, using the consistency ranking value, can identify the pairs of objects whose topological relations has changed more deeply. These cases could be considered as candidate errors and the list of all inconsistent objects can be used to generate, by difference, some consistent views of the input maps, to be consistently used inside GIS applications. L As an example, consider datasets M L1R roads and M L1roads . Such datasets are neither eq-based nor dist-based consistent. Using the consistency ranking value defined in Definition 5, we can however determine which pairs of features are distbased consistent and which are not. By analyzing the ranking values, we discovered that most non-consistent topological relationships correspond to Disjoint in M L1R roads and T ouch in M L1L roads . As an example, consider the pair of roads 72 and 160 in Figure 3, where two snapshots from L corresponding areas of M L1R roads and M L1roads are presented. Their relation is T ouch in Figure 3(a) and Disjoint in Figure 3(b). According to Table 3, this pair violates both dist-based and eq-based consistency. On the other hand, the pair of roads 70 and 160 are eq-based consistent since their relation is T ouch in both Figure 3(a) and Figure 3(b).

6.4

Application 3: temporal evolution

The evaluation of changes in time between two maps regarding the same (or overlapping) area but generated at different time instants is an interesting application for the proposed similarity measure. The similarity analysis may quantify the changes occurred in time. Ranking can then be used to give to the user information concerning the most changed pairs of objects. As an example, consider dataset M1 = LM 293 buildings ∪ 01 ∪ LM 201 LM 293 and dataset M = LM 2 2 streets . streets buildings If we compute the distance between M1 and M2 we find that Mmd (M1 , M2 ) = 0.2, where Mdd = 0, Mcd = 0, 61 and Mtd = 1.21310595 ∗ 10−5 , assuming α, β, γ = 0.333. Notice that: i) the dimensional distance is zero, since the maps have the same schema; ii) since there is a very small topological distance between the two maps, we can deduce that only few changes occurred on the common objects; (ii) since the content distance is high, we can conclude that the temporal evolution of this municipality was due practically to new buildings and new streets. By considering similarity ranking values, according to Definition 10, we noticed that the most frequent distance value (0.11) refers to pairs of features (either street or buildings) whose topological relationship changed from Disjoint in M1 to T ouch in M2 . Thus, we may claim that the small topological change was mainly due to buildings and streets extensions. The highest but not very frequent ranking value was

72

72

82

80 70

72

109

77

72

82

80 70

71

160

77

72

109

160

160 164

164

77

164

77

164 162

164

162

(a)

(b)

R Figure 3: Two snapshots of the same area taken from (a) M L1L roads , (b) M L1roads .

0.45, obtained when the topological relationship between streets changed from Overlap in M1 to Disjoint in M2 , probably due to the introduction of new streets.

[5]

7. CONCLUSIONS AND FUTURE WORK In this paper we have presented notions of consistency and similarity for multiresolution maps, extending those previously introduced to cope with changes in feature dimensions. In order to validate the introduced concepts, some experimental results have been provided, over both real and synthetic datasets, concerning three different application scenarios requiring the usage of multiresolution maps. As a future work, we plan to further elaborate on the proposed scenarios, by proposing specific techniques for dealing with multiresolution in query processing and geographical data analysis. In particular, an important topic concerns the usage of the proposed concepts in knowledge discovery processes for spatial data. In this context, rules have to be defined in order to assign a semantics to specific similarity and consistency results, depending on the specific application context. An additional issue we plan to investigate concerns the usage of different partitions of 9-intersections for defining the reference set of topological relations and, as consequence, the extension of the proposed results to those new cases.

8. REFERENCES [1] ArcGIS Working With Geodatabase Topology. http://www.esri.com/library/whitepapers/pdfs/ geodatabasetopology.pdf [2] H. T. Burns and M. J. Egenhofer. Similarity of Spatial Scenes. In Proc. 7th Int. Symp. on Spatial Data Handling, pages 31-42, 1996. [3] E. Clementini, P. di Felice, and P. van Oosterom. A Small Set of Formal Topological Relationships Suitable for End-User Interaction. In LNCS 692: Proc. 3rd Int. Symp. on Advances in Spatial Databases, pages 277-295, 1993. [4] M. J. Egenhofer and K. Al-Taha. Reasoning about Gradual Changes of Topological Relationships. In LNCS 639: Proc. of Theory and Methods of

[6]

[7]

[8]

[9]

[10] [11]

[12]

[13]

[14]

[15]

Spatio-Temporal Reasoning in Geographic Space, pages 196-219, 1992. M. J. Egenhofer, E. Clementini, and P. di Felice. Evaluating Inconsistency Among Multiple Representations. In Proc. 6th Int. Symp. on Spatial Data Handling, pages 143-160, 1994. M. J. Egenhofer and R. D. Franzosa. Point-set Topological Spatial Relations. Int. Journal of Geographic Information Systems, 5(2):161-174, 1991. M. J. Egenhofer and J. Herring. Categorizing Binary Topological Relations Between Regions, Lines and Points in Geographic Databases. Tech. Rep., Dep. of Surveying Engineering, University of Maine, 1990. M. J. Egenhofer and D. Mark. Modeling Conceptual Neighborhoods of Topological Line-Region Relations. Int. Journal of Geographic Information Systems, 9(5):555-565, 1995. M. Essid, O. Boucelma, F. M. Colonna, and Y. Lassoued. Query Processing in a Geographic Mediation Systems. In Proc. 12th ACM Int. Work. on Geographic Information Systems, pages 101-108, 2004. JTS Topology Suite. http://www.vividsolutions.com/jts/jtshome.htm OpenGeoSpatial Consortium. OpenGIS Simple Features Specification for SQL. Technical Report OGC 99-049, 1999. Oracle Spatial User’s Guide and Reference 10g. http://downloadwest.oracle.com/docs/cd/B14117 01 /appdev.101/b10826.pdf N. Tryfona and M. J. Egenhofer. Multi-Resolution Spatial Databases: Consistency Among Networks. In Proc. 6th Int. Work. on Foundations of Models and Languages for Data and Objects, pages 119-132, 1996. N. Tryfona and M. J. Egenhofer. Consistency among Parts and Aggregates: A Computational Model. Transactions in GIS, 1(3):189-206, 1997. E. Puppo and G. Dettori. Towards a Formal Method for Multiresolution Spatial Maps. In Proc. 4th Int. Symp. on Advances in Spatial Databases, pages 152-169, 1995.

Suggest Documents