Modeling Consistency of Spatio-temporal Graphs

0 downloads 0 Views 942KB Size Report
Dec 19, 2012 - key. 2. Without loss of generality, we assume that the name of an entity table ...... If all the queries return an empty answer, then the. 26 .... database instances share the same P table but have different filiation relations in.
Modeling Consistency of Spatio-temporal Graphs G. Del Mondoa,∗, M. A. Rodr´ıguezc,∗∗, C. Claramuntb,, L. Bravoc,, R. Thibaudb, ´ University of Normandy, INSA/LITIS, Avenue de l’Universit´e,76801 Saint-Etienne-du-Rouvray Cedex, France b Naval Academy Research Institute, CC600, 29240 Brest cedex 9, France c Department of Computer Science, University of Concepci´on Edmundo Larenas 219, 4070409 Concepci´on, Chile

a

Abstract This work introduces a graph-based approach to the representation of evolving entities in space and time. At an abstract level, the model makes a distinction between filiation and spatial relationships between entities, while at the database level, it stores derivation relationships and determines continuation and spatial relationships in time. An extended relational database specification implements the spatio-temporal graph model. A formal language models integrity constraints that are inherent to the model and those that are semantic and application dependent. The satisfiability of these constraints is studied and an algorithm for checking consistency of spatio-temporal graph is provided. An experimental evaluation shows the potential of the model. Keywords: Spatio-temporal modeling, spatio-temporal graph, spatio-temporal consistency, integrity constraints 1. Introduction Modeling spatio-temporal evolution has been for a long time an object of considerable attention within Geographical Information Systems (GIS) research. This ∗

Principal corresponding author Corresponding author Email addresses: [email protected] (G. Del Mondo), [email protected] (M. A. Rodr´ıguez ), [email protected] (C. Claramunt ), [email protected] (L. Bravo ), [email protected] (R. Thibaud ) ∗∗

Preprint submitted to Data & Knowledge Engineering

December 19, 2012

is still an expected development of GISs to successfully observe changes of natural and urban phenomena and forecast future trends. One of the main reasons for such a lack of integration of time within GISs relies in the fact that those systems have been initially and mainly developed for cartographical purposes, the initial objective being the automation of map production processes [41]. GISs are very specific in the way they represent spatial entities and relations between them. Spatial entities are commonly related by topological [31, 7, 11] and cardinal relationships [15, 14], and generate complex networks of relations that provide a support for spatial reasoning. But spatial entities are not static, they evolve in different manners and form evolution networks whose properties should be studied in order to understand the phenomena behind [13]. Evolving spatial entities can be ordered by a temporal algebra [1], as spatial entities can be ordered by some metrics in space. However, time by itself cannot represent the evolution of spatial entities, the way those entities appear, change, and generate new entities. Time introduces a specific modeling dimension that complements the thematic (i.e., attributes that qualify the entities) and spatial ones. GISs need a unified conceptual framework that integrates time, space, and thematic dimensions [28, 29]. At the conceptual level, the notion of identity has been introduced to qualify different categories of change in space and time [21]. Object-oriented modeling offers several constructs to represent the evolution of entities in space and time [44, 23, 18]. Event-based formalisms and languages [45, 24], and conceptual representations of processes in space and time have been introduced to model spatio-temporal dynamics [40, 35, 43]. Moreover, qualitative reasoning provides modeling constructs and mechanisms to reason on spatiotemporal changes, events, and processes [16, 25, 26, 20, 24, 34, 8]. In particular, networks generated by the evolution of entities encompass many properties whose study should provide new insights to the study of the underlying phenomena [2, 42, 38]. In [36], the properties of these networks have been studied to characterize the spatio-temporal processes that emerge from a cadastral application. In [? ], the properties of networks generated by consecutive land-use snapshots have been studied from a qualitative point of view using some fuzzy variables that qualify degrees of change. In a related domain, traffic exchanges generate timevarying networks that can be studied to understand these phenomena and their distribution in space and time [30]. A related work introduces a preliminary modeling approach based on a dependence relation that represents the properties of evolving spatial entities [37]. While related work has been oriented to moving objects and trajectories [35], the work presented in this paper is rather oriented to the modeling of the evolution of 2

regions. Using a graph-based approach, a spatio-temporal model makes a distinction between time-varying spatial and semantic relations [9]. The work presented in this paper extends our previous work [9] by studying the properties and constraints that can enrich the semantics of the spatio-temporal graph model, and by providing a database implementation of the whole approach. At the conceptual level, filiation and spatial relations are now represented at non necessarily consecutive times. This favors a combination of different spatio-temporal graphs, which extends the expressive power of the model. A spatio-temporal graph is formalized, and complemented by several graph manipulation operators, namely, graph union, graph intersection, and a graph join. At the database level, it introduces an extended-relational data model to represent a spatio-temporal graph, which is also used as a formal language to express integrity constraints. It also defines the mapping from the spatio-temporal graph to the spatial-temporal database, and vice versa, highlighting their differences. By expressing integrity constraints, a spatio-temporal graph is said to be consistent if it satisfies this set of constraints. Consistency is a desirable condition and should be enforcement whenever possible, this being an important requirement for successful data manipulation. Indeed, inconsistent data may have a negative impact on any data analysis and processing. Consider, for example, the case of a land-management application that monitors the historical evolution of land parcel ownerships. Assume that the geometry of a land parcel is modified such that a part of it is separated and given a new identity belonging to another owner. In such case, one would expect that the area of the original geometry, previous modification, should be equal to the area of the new geometries generated (the new geometry of the original entity and the geometry of the new entity). If this is not the case, the dataset is not consistent and may lead to problems for tax assignments. Typical enforcement of consistency is done when new data is inserted or updated in a database. However, in many cases, mechanisms for enforcing the satisfaction of these constraints at the time of database updates are not necessarily available or even feasible due to time lag updates or integration of different data sources. In cases of inconsistency, data cleaning strategies [33] or inconsistency tolerance approaches for query answering [32] are applied. This paper contributes to these two applications, since it defines integrity constraints, analyzes basic properties of these constraints, and provides algorithms for (in)consistency checking. The remainder of this paper is organized as follows. Section 2 introduces and formalizes the spatio-temporal graph model. Section 3 develops an extended3

relation data model as a spatio-temporal database model. Section 4 presents semantic integrity constraints of the spatio-temporal database, discusses the consistency of the constraints, and evaluates the consistency of a graph instance. Section 5 gives an experimental evaluation for (in)consistency checking applied to evolving land parcels in a cadastral application. Finally, Section 6 concludes the paper and draws further research directions. 2. Spatio-temporal graph model In order to develop the modeling approach, let us introduce several spatial and temporal notions which are given below. Temporal dimension. Time is considered as discrete elements (i.e. time instants) at which entities may be observed. A set of time instants forms a partially ordered set ≤. Any phenomenon is observed at a given level of granularity which is application dependent. For example, we may describe evolving entities along hours, days, weeks, months, seasons, years, and so on, each of them defines a time granularity. Spatio-temporal entities. A spatio-temporal entity (also denoted entity) represents an abstraction of the real world. An entity has a fixed identity and a type that describes its semantics; for example, types of entities are counties, cities, lakes, and so on. An entity can have also time-dependent thematic and spatial properties. For example, the geometry of an entity is a property that can vary in different time instants. Relations connect entities at the same or different time instants. In this model, topological and filiation relations are taken into account. Topological relations. Entities are primarily related in space by topological relations1 , whose properties have been extensively studied in the literature. Let us consider the eight jointly exhaustive and pairwise disjoint topological relations between regions defined by the RCC8 [31] and the 9-Intersection model [10], relations known as basic topological relations. From these basic topological relations, one can also derive topological relations as conjunctive formulas. For example, a topological relation PP means that two geometries may be TPP or NTPP. Table 1 gives the topological relations definable in terms of the binary 1

Other qualitative relations such as cardinal and distance relations have limited interest in the context of this work.

4

predicate C(x, y) of the RCC8 model [31], whose semantics is that “geometry x is connected to geometry y”. Relation DC(x, y)* P(x, y) PP(x, y) EQ(x, y)* O(x, y) DR(x, y) PO(x, y)* EC(x, y)* TPP(x, y), TPPi(y, x)* NTPP(x, y), NTPPi(y, x)*

Interpretation x is disconnected from y x is a part of y x is a proper part of y x is equivalent with y x overlaps y x is internally disconnected from y x partially overlaps y x is externally connected to y x is a tangential proper part of y x is a nontangential proper part of y

Definition ¬C(x, y) ∀z(C(z, x) → C(z, y)) P(x, y) ∧ ¬P(y, x) P(x, y) ∧ P(y, x) ∃z(P(z, x) ∧ P(z, y)) ¬O(x, y) O(x, y) ∧ ¬P(x, y) ∧ ¬P(y, x) C(x, y) ∧ ¬O(x, y) PP(x, y) ∧ ∃z(EC(z, x) ∧EC(z, y)) PP(x, y) ∧ ¬∃z(EC(z, x) ∧EC(z, y))

Table 1: Relation definable in terms of C(x, y) (* Basic relations)

Filiation relations. Filiation relations are intimately associated with the concept of identity. The identity has been defined as the trait that distinguishes an object from all others [21, 22]. When two entities are related by a filiation relation, this means that there is a dependance of identity. Two types of filiation relations are generally distinguished: continuation (ρc ) and derivation (ρd ). There is a continuation relation ρc if entity e at time ti continues at times tj with ti < tj . Notice that if there is a continuation between an entity at time ti and tj , this does not imply that the entity exists between ti and every time t0 such that ti ≤ t0 ≤ tj . Transitivity is a property of the continuation relation. Let ti , tj and tk be time instants such that ti < tj < tk , and e be an entity, if there is a ρc for entity e between time ti and tj and also between tj and tk , then there is a continuation relation for e between ti and tk . A derivation, denoted by ρd , occurs when an entity e at ti derives into a second entity e0 at tj , where ti ≤ tj . Derivation relations, unlike continuation relations, can apply over entities at the same time instant. Also, the derivation relation is not necessarily transitive, since this depends on the particular semantics of the derivation relation. Some specific types of derivation relations are considered later in this paper, which are of particular interest to characterize geographic phenomena (see Section 4). For example, a type of derivation relation exists between an entity and two or 5

more entities when the geometry of the first one splits to create the other entities. The semantics of this derivation relation implies that the original entity ceases to exist after giving birth to the new entities, but also it implies some topological relations between the original and the new geometries. This semantics is captured by introducing a set of integrity constraints in Section 4. Following ideas from the previous work in [9], the basic notion of a spatiotemporal graph follows. Spatio-temporal graph. At an abstract level, entities (i.e., vertices) are related by spatial and filiation binary relations (i.e., edges). All topological relations encompass a form of spatial connection at the exception of the disjoint relation DC, which is thus considered by default in the construction of the graph. Otherwise, the complexity of the graph increases artificially. A related work [9] gave a conceptual definition of spatio-temporal graph that results from the aggregation of different subgraphs that make a distinction between entities connected at the same (filiation and spatial subgraphs) or at different time instants (spatio-temporal or temporal-filiation subgraphs). Unlike the work developed in [9], this work defines the spatio-temporal graph without making distinction between graphs with entities at a single time instant or at different time instants. This provides a more general and homogenous definition of graphs, where it is always possible to specify subgraphs based on temporal criteria. The model also allows the connection of entities at non-consecutive time instants, which makes possible a more general definition of operations over spatio-temporal graphs. Having relations between non-consecutive time instants produces a potentially quadratic number of spatial relations between entities at different time instants. However, a number of relations can be derived from composition relations [12] and, therefore, they are not necessarily modeled in the spatio-temporal graph. 2.1. Formalization of the graph model For a given spatio-temporal entity e, we represent by Id (e), Type(e), and Geom(e, t) its identity, type, and geometry at time t, respectively. Later, we will introduce other (possibly) time-dependent thematic properties of entities (see Section 4). In general, the value of a thematic attribute of entity e at time t is denoted by U (e, t). A time domain T is a list T = (t1 , . . . , tn ) of time instances at the same granularity [4], where ti < ti+1 for every i ∈ [1, n]. These principles can be easily

6

adjusted to consider a time domain with a partial order, but in order to simplify the presentation, we restrict ourselves to time with a total order. Let us consider a set of topological relations Γ and a set of filiation relations Φ. In general, unless explicitly noted, we will assume Γ contains the topological relations defined in Table 1, excluding DC, and Φ = {ρc , ρd }, this is, the continuation and derivation relations. A topological relation T ∈ Γ between the geometry of entity ei at time instant ti and the geometry of entity ej at time instant tj is denoted by T [(ei , ti ), (ej , tj )], which is equivalent to say that T (Geom(ei , ti ), Geom(ej , tj )) (with T as defined in Table 1 excluding relation DC). A filiation relation ρ ∈ Φ between entity ei at time ti and entity ej at time tj is denoted by ρ[(ei , ti ), (ej , tj )]. In particular, for the continuation relation ρc , since it applies to the same entity, ei should be equal to ej . Namely, ρ[(e, ti ), (e, tj )] denotes the continuation relation of entity e between ti and tj . The definition of a spatio-temporal graph G with a set V of vertices and two sets of edges EΓ and EΦ is now formally introduced. Definition 1 (Spatio-temporal graph). Given a domain of entity labels I, a time domain T , a set of topological relations Γ, and a set of filiation relations Φ, let the spatio-temporal graph G be a tuple (V, EΓ , EΦ ) where (i) V is the set of vertices (e, t), where e ∈ I and t ∈ T ; (ii) EΓ is a set of tuples ((ei , ti ), T, (ej , tj )), where (ei , ti ), (ej , tj ) ∈ V, ti ≤ tj and T ∈ Γ; and (iii) EΦ is a set of ((ei , ti ), ρ, (ej , tj )), where (ei , ti ), (ej , tj ) ∈ V, ti ≤ tj and ρ ∈ Φ. 2 Intuitively, V is the set of vertices in the graph denoting entities existing at specific time instants in T , and there is an edge between them in the graph with label l, if and only if, there is a topological or filiation relation l between them. Edges in EΓ and EΦ represent topological and filiation relations, respectively. Example 1. We illustrate the interest of the spatio-temporal graph model by a representative case study that shows the evolution of regions and provinces in Chile. The spatio-temporal graphs in Figures 1 and 2 represent part of the evolution of regions (at 1976 and 2006) and provinces (at 1925, 1976 and 2006) of Chile2 . Before 1976, only provinces exist, which were then aggregated into regions after 1976. In this model, regions and provinces remain with the same identity when they keep the same capital. Also, notice that even if regions or provinces remain 2

In figures, dashed lines and solid lines represent filiation and spatial relations, respectively.

7

G1 ρd

R1

TPPi

R3

TPPi EC ρc

R1 ρc

R2

TPPi

R2

TPPi EC ρd

1976

2007

1976

R4 2006

Figure 1: Spatio-temporal graph G1 of the evolution of regions in Chile

with the same identity, their geometries may change and, therefore, the spatial relations between geometries of the same identity, but at different time instants, are not necessarily EQ. Figure 1 also shows two maps of the whole country at 1976 and 2006, highlighting regions that are then modeled by the spatio-temporal graph. Similarly, Figure 2 shows the evolution of provinces and how they change their boundaries from 1925 to 2006. 2.2. Operations over spatio-temporal graphs When representing a given phenomenon, it is relatively common to have complementary data sources that together will enrich the semantics of the application. This is why the combination of different spatio-temporal graphs yielding a convergent semantics requires to develop appropriate graph-manipulation operators. The following definitions introduce three basic operators over graphs. Definition 2 (Operators). Given a time domain T (V1 , EΓ1 , EΦ1 ) and G2 = (V2 , EΓ2 , EΦ2 ):

8

and graphs G1

=

G2 ρc

P6

ρd TPPi ρd

P1

EQ

TPPi

EC

ρc EQ

P7

P6 EC

P7

TPPi EC

ρd

ρc TPPi

P8

EC

EC

P8

TPPi EC

ρd

P11 ρc EQ

P2 EC

P3

ρd TPPi

ρc

P2 EC

P9

TPPi EC

P5

ρd TPPi

EC

P12 EC

ρc

P3

P9 EQ

TPPi EC

ρd

P2

TPPi EC

ρc

EQ

EC

ρc

P10

P3 EQ EC

ρc

P10 ρc

P4

1925

TPPi

ρc

P4

1976

EQ

P4

2006

Figure 2: Spatio-temporal graph G2 of the evolution of provinces

(i) The union graph of G1 and G2 is a spatio-temporal graph G1 ∪ G2 = (V 0 , EΓ0 , EΦ0 ), where V 0 = V1 ∪ V2 , EΓ0 = EΓ1 ∪ EΓ2 and EΦ0 = EΦ1 ∪ EΦ2 . (ii) The intersection graph of G1 and G2 is a spatio-temporal graph G1 ∩ G2 = (V 0 , EΓ0 , EΦ0 ), where V 0 = V1 ∩ V2 , EΓ0 = EΓ1 ∩ EΓ2 and EΦ0 = EΦ1 ∩ EΦ2 . (iii) The join of the two graphs connects entities from one to the other graph. Consider (a) a set RΓ of topological relations between vertices such that if T [(ei , ti ), (ej , tj )] ∈ RΓ then (ei , ti ) ∈ V1 , (ej , tj ) ∈ V2 , ti , tj ∈ T , ti < tj , 9

and T ∈ Γ; and (b) a set RΦ of filiation relations between vertices such that if ρ[(ei , ti ), (ej , tj )] ∈ RΦ then (ei , ti ) ∈ V1 , (ej , tj ) ∈ V2 , ti , tj ∈ T , ti < tj , and ρ ∈ Φ. The join of G1 and G2 with relations RΓ and RΦ is a spatio-temporal graph G1./G2 (RΓ , RΦ ) = (V 0 , EΓ0 , EΦ0 ) where V 0 = V1 ∪ V2 , EΓ0 = EΓ1 ∪ EΓ2 ∪ {(v, T, w, )|T [v, w] ∈ RΓ }, and EΦ0 = EΦ1 ∪ EΦ2 ∪ {(v, ρ, w)|ρ[v, w] ∈ RΦ }. 2 Figures 3 provides an example of the join operator. G1 ./ G2 (RΓ , RΦ ) G1

G2

ρc

v

v

ρd

v

ρc

v

x

w

T2

x T1

y

w z ti

ρc

y

z y

t tj

tk

t

tl

ti

tj

tk

tl

ti

tj

y

ρc

tk

tl

Figure 3: Example of join between G1 and G2 for RΓ = {T1 [(w, ti ), (x, tj )], T2 [(v, tk ), (y, tl )]} and RΦ = {ρd [(w, ti ), (x, tj )]}

3. Spatio-temporal database The implementation of a spatio-temporal graph uses an extended-relational model, which is implemented by current spatial databases such as Postgres+Postgis or Oracle spatial, and which are OpenGIS compliant [27]. This section introduces a schema a schema to represent the spatio-temporal graph over which one can identify integrity constraints (ICs) to maintain the consistency of the data or to detect inconsistencies and clean the data. 3.1. Database schema and instances In order to facilitate the presentation of the database schema, the following sets of attributes are distinguished.

10

Definition 3. Given a set U of possible attribute values, a time domain T , a set of topological relations Γ and a set Φ of filiation relations. A classification of attribute names that store different types of data is: (i) The set AU of possible thematic, non-geometric, attribute names that take values in U. (ii) The set AS of possible geometric attribute names that take admissible values in P(R2 ), the power set of R2 . (iii) The set AT of possible temporal attribute names that take values in T . (iv) The set AΦ of possible filiation attribute names that take values in Φ. 2 Spatial objects can be modeled with different abstraction mechanisms. For values in attribute AS , we concentrate on regions in a 2D space to model3 objects that have an extent. They are useful in a broad class of applications in GISs. A finite representation of geometries is considered, which is compatible with the specification of spatial data types and spatial relations as found in current SDBMSs [27]. An admissible geometry Ad of the Euclidean plane will be a (closed and bounded) polygonal region with an area. The attributes with names in AS will take values from Ad. The schema of a database table R with a list of attributes Ai , where i ∈ [1, n], is denoted by R[A1 , . . . , An ]. A spatio-temporal database schema contains tables that represent entities and tables that represent relationships between entities at different time instances. The schema Σ of a database contains the set of table schemas. An instance D of a schema Σ is a set of predicates P(a1 , . . . , an ) (i.e., (a1 , . . . , an ) is a tuple of table P) for which P[A1 , . . . , An ] ∈ Σ. The schema (and instance) can be extended to consider also a domain for each attribute. In that case, the value ai corresponding to an attribute Ai should belong to the domain of Ai . The following definitions introduce the components of a spatio-temporal database that are able to deal with entities and both topological and filiation relations. Entities. Entity tables that store the information about the entities at different time instants represent spatio-temporal entities. Definition 4. An entity table is of the form E[ID, Time, U1 , . . . , Un , G], where ID ∈ AU , Time ∈ AT , a (possibly empty) sequence of thematic attributes 3

Although this work concentrates on a 2D space, it can be extended to handle objects in a higher dimensional space by re defining the set of topological relations in Γ.

11

Ui ∈ AU for i ∈ [1, n], and G ∈ AS . Attributes (ID, Time) are the table’s primary key. 2 Without loss of generality, we assume that the name of an entity table corresponds to the Type of the entities it stores. Note that if we do not want to do this, it is sufficient to include the type as a thematic attribute of the relation. A tuple (id 1 , t1 , ui , . . . , un , g1 ) belongs to table E if every value corresponds to the domain of the respective attribute, and there exists e at time t1 such that Id (e) = id 1 and Geom(e, t1 ) = g1 . The values u1 , . . . , u2 in the thematic attributes correspond to values in the domain U that are associated with an entity at a time instant. This work considers that thematic and geometric attributes in the spatio-temporal database are time-dependent; otherwise, it would be necessary to do a decomposition of entity tables into tables with attributes that are time-dependent and tables with attributes that are time-independent to keep the normalization of the database. Topological Relations. Unlike the spatio-temporal graph model where topological relations are explicitly represented by edges in the graph, in an extended relational database there is no need to explicitly store them, since they are derived on demand from the geometries stored in the tables. Therefore, there is no table in the spatio-temporal schema storing these types of relations. Filiation Relations. Most filiation relations cannot be derived from the geometries, therefore, they need to be explicitly stated. Continuation relations are one exception, since they depend only on the entities’ identifier and can be computed from the entity tables in the database. In this sense, continuation relations do not need to be explicitly stored, but they can be computed on the fly by looking at the ID attribute of entity tables. Filiation relations between entities are represented by a filiation table that stores the entities involved and the type of filiation relation. Definition 5. A filiation table is of the form F[ID1 , Time1 , Φ, ID2 , Time2 ], where ID1 , ID2 ∈ AU , Φ ∈ AΦ and Time1 , Time2 ∈ AT . The primary key of the table contains all its attributes. 2 In general, we want to have a filiation table for each pair of entity types that have filiation relations. Though, without loss of generality, we assume that if there is a filiation relation ρ[(e1 , t1 ), (e2 , t2 )], there is filiation table between Type(e1 ) and 2 Type 2 (e2 ) that represents it. The name of that filiation table is FType Type1 , where Type1

12

= Type(e1 ) and Type2 = Type 2 (e2 ). For example, if from a province we can derive both regions and provinces, the filiation information is stored in tables Fregion province province and Fprovince . The first one will hold the filiation relations between provinces and regions and, the other one, the ones between provinces and provinces. This way of storing the filiation information allows us to define foreign keys between 2 FType Type1 [ID1 , Time1 , F, ID2 , Time2 ] and the entity tables Type1 and Type2 . Namely, Type2 2 FType Type1 [ID1 , Time1 ] ⊆ Type1 [ID, Time] and FType1 [ID2 , Time2 ] ⊆ Type2 [ID, Time]. A tuple (id 1 , t1 , ρ, id 2 , t2 ) belongs to a filiation table F if every value corresponds to the domain of the respective attribute and there exist entities e1 and e2 at times t1 and t2 , respectively, such that Id (e1 ) = id 1 , Id (e2 ) = id 2 , ρ 6= ρc , and ρ[(e1 , t1 ), (e2 , t2 )] holds. Definition 6. The schema Σ of a spatio-temporal database is a tuple (RE , RΦ ), where RE , and RΦ are sets of entity and filiation tables, respectively. 2 Generally, the definition of a database schema considers a single set of table schemas. Here we choose to separate them in two sets so that it is easy to distinguish between entity and filiation tables. A database instance D will be a set of tuples belonging to both relations RE and RΦ that satisfy all the domain conditions given above for each type of table. We will refer to the combination of the schema Σ and instance D as a spatiotemporal database and denote it by M = hΣ, Di. 3.2. Mapping database and graph representations This section presents the transformation from a spatio-temporal graph to a spatiotemporal database, and vice versa. It uses the convention that the name of an entity 2 relation is the Type of the entities the table stores, and that FType Type1 denotes filiation relations between entities of types Type1 and Type2 . From database to graph. Given a set U of attribute values, a time domain T , a set of topological relations Γ and a set Φ of filiation relations, a spatio-temporal database M = hΣ, Di, with schema Σ = (RE , RΦ ) and a database instance D of Σ, the following mapping rules define a spatio-temporal graph G = (V, EΓ , EΦ ): • V = {(e, t)|E(id, t, u1 , . . . , un , g) ∈ D, E[ID, Time, U1 , . . . , Un , G] ∈ RE , and e is an entity with Id (e) = id, U1 (e, t) = u1 , . . . , Un (e, t) = un , Geom(e, t) = g, T ype(e) = E}.

13

• EΓ = {((e1 , t1 ), T, (e2 , t2 ))| {E1 (id1 , t1 , u1 , . . . , un , g1 ), E2 (id2 , t2 , v1 , . . . , vm , g2 )} ⊆ D, {E1 [ID, Time, U1 , . . . , Un , G], E2 [ID, Time, U1 , . . . , Um , G]} ⊆ RE , T(g1 , g2 )4 , T ∈ Γ, and e1 and e2 are entities with T ype(e1 ) = E1 , Id (e1 ) = id1 , U1 (e1 , t1 ) = u1 , . . . , Un (e1 , t1 ) = un , Geom(e1 , t1 ) = g1 , T ype(e2 ) = E2 , Id (e2 ) = id2 , U1 (e2 , t2 ) = v1 , . . . , Um (e2 , t2 ) = vm , Geom(e2 , t2 ) = g2 }. • EΦ = {((e1 , t1 ), ρ, (e2 , t2 )) | FEE21 (id1 , t1 , ρ, id2 , t2 ) ∈ D, FEE21 ∈ RΦ , and e1 and e2 are entities with Id (e1 ) = id1 , Id (e2 ) = id2 , T ype(e1 ) = E1 , T ype(e2 )=E2 }5 ∪ {((e, t1 ), ρc , (e, t2 )) | E(id, t1 , u1 , . . . , un , g1 ), E(id, t2 , v1 , . . . , vm , g2 ) ⊆ D, e is an entity with Id (e) = id, and t1 < t2 }.6 From graph to database. Given a U of attribute values, a time domain T , a set of topological relations Γ and a set Φ of filiation relations, and a spatiotemporal graph G = (V, EΓ , EΦ ), where for each (e, t) ∈ V, there exists a geometry Geom(e, t), the following mapping rules define a spatio-temporal database M1 = hΣ, Di with Σ = (RE , RΦ ) and a database instance D of Σ: • RE = {E[ID, Time, U1 , . . . , Un , G] | (e, t) ∈ V, Type(e) = E, entity e has n functions U1 , . . . , Un that depend on an entity of type E and the time instant, and return a value in U}. • RΦ = {FEE21 [ID1 , Time1 , Φ, ID2 , Time2 ] | ((e1 , t1 ), ρ, (e2 , t2 )) ∈ EΦ , ρ 6= ρc , Type(e1 ) = E1 and Type(e2 ) = E2 }. The database instance D for schema Σ is the union of the following sets: • {E(id, t, u1 , . . . , un , g) | (e, t) ∈ V, Type(e) = E, Id (e) = id, E[ID, Time, U1 , . . . , Ui , . . . , Un , G] ∈ RE , for every Ui Ui (e, t) = ui , and Geom(e, t) = g}. • {FEE21 [id1 , t1 , ρ, id2 , t2 ]|((e1 , t1 ), ρ, (e2 , t2 )) ∈ EΦ , Id (e1 ) = id1 , Id (e2 ) = id2 , and ρ 6= ρc }. The instances of entity tables require that the entities in the graph model have a given geometry at each time instant in which they exist. It is possible to relax this condition by including alternative transformation rules. Both representations of spatio-temporal data are not equivalent. Indeed, if a spatio-temporal graph is transformed to a spatio-temporal database, and then the 4

As defined in table 1 excluding relation DC Since continuation relations are not stored in the database ρ 6= ρc 6 The other attributes of vertices (e, t1 ) and (e, t2 ) (namely Type, Geom and other functions) can be obtained from the vertices (e, t1 ), (e, t2 ) ∈ V that always exist. 5

14

database is transformed back to the graph, the data may not be the same. The new graph might have: 1. Cleaner data: Consider the case in which ((e1 , t1 ), T, (e2 , t2 )) ∈ EΓ but T (Geom(e1 , t1 ), Geom(e2 , t2 )) does not hold. The transformation to the database and then back to the graph would remove that tuple from EΓ . 2. Additional topological and continuation relations: Since these relations are computed directly from the geometries and ids of the entities, more information can be added. On the opposite direction, this is, the database is transformed to the graph and then again back to a database, one would end up with the same initial database. Example 2. Figure 4 shows the spatio-temporal database M1 = hΣ1 , D1 i that can be obtained from the spatio-temporal graph in Example 1, where the geometries of entities in the graph are the ones shown in Figure 1. In Σ1 , the domain T is a total ordered set of years and Φ = {ρd }. Table in Σ are: (i) An entity table Region[ID, Year, Name, G], where ID, Year identifies uniquely the region of the identity in the domain ID for the time instant in the domain Year , Name is a thematic attribute denoting the name of the region, and G is a geometric attribute representing the geometry of the region. (ii) A filiation table FRegion Region [ID1 , Year1 , Φ, ID2 , Year2 ] represents the derivation relations between regions. 2 Note that the previous database instance shows redundancy when we store entities at each time that do not change. This is due to the semantics of the spatiotemporal model, which may have incomplete information about the valid time of an entity. Recall that between two time instants, no assumption of the existence of an entity is made. We left for future work the optimization of the spatio-temporal database with respect to redundancy. Graph operators. We now define the operators shown in Section 2.2 for the database representation. Definition 7. Given a set U of attribute values, a time domain T , a set of topological relations Γ and a set Φ of filiation relations, and two spatio-temporal database M1 = hΣ1 , D1 i and M2 = hΣ2 , D2 i where Σ1 = {RE1 , RΦ1 } and Σ2 = {RE2 , RΦ2 }: (i) The union of spatio-temporal databases, denoted M1 ∪ M2 , is hΣ0 , D0 i where D0 = D1 ∪ D2 and Σ0 = (RE1 ∪ RE2 , RΦ1 ∪ RΦ2 ). The intersection 15

ρd

R1

TPPi

TPPi ρc

R3 Region

EC

R1 ρc

R2

TPPi

TPPi ρd

ID R1 R2 R1 R2 R3 R4

Name Tarapac´a Los Lagos Tarapac´a Los Lagos Arica y Parinacota Los R´ıos

G g1 g2 g3 g4 g5 g6

R2 EC

FRegion Region

R4 1976

Year 1976 1976 2006 2006 2006 2006

ID1 R1 R2

Year1 1976 1976

Φ ρd ρd

ID2 R3 R4

Year2 2006 2006

2006

Figure 4: Mapping from the spatio-temporal graph to a spatio-temporal database M1 of regions

Province

IDp P1 P2 P3 P4 P5 P2 P3 P4 P6 P7 P8 P9 P 10 P2 P3 P4 P6 P7 P8 P9 P 10 P 11 P 12

Year 1925 1925 1925 1925 1925 1976 1976 1976 1976 1976 1976 1976 1976 2006 2006 2006 2006 2006 2006 2006 2006 2006 2006

Name Tacna Tarapac´a Valdivia Chilo´e Llanquihue Tarapac´a Valdivia Chilo´e Arica Parinacota Iquique Osorno Palena Tarapac´a Valdivia Chilo´e Arica Parinacota Iquique Orono Palena Tamarugal Ranco

G s7 s8 s9 s10 s11 s8 s12 s11 s13 s14 s15 s16 s17 s8 s12 s10 s13 s14 s18 s19 s17 s20 s21

FProvince Province

ID1 P1 P1 P1 P3 P5 P8 P8 P9

Year1 1925 1925 1925 1925 1925 1976 1976 1976

Φ ρd ρd ρd ρd ρd ρc ρd ρd

ID2 P6 P7 P8 P9 P 10 P8 P 11 P 12

Year2 1976 1976 1976 1976 1976 2006 2006 2006

Figure 5: Database M2 of provinces

of a spatio-temporal database can be defined in an analogous way if ∪ is

16

replaced by ∩. (ii) Given a filiation table FEE21 [ID1 , Time1 , Φ, ID2 , Time2 ] with its instance DF . The join of M1 and M2 with instance DF of table FEE21 is M1 1 M2 (FEE21 , DF ) = hΣ0 , D0 i, where D0 = D1 ∪D2 ∪{FEE21 [id1 , t1 , ρ, id2 , t2 ] ∈ DF | E1 (id1 , t1 , u1 , . . . , un , g) ∈ D1 , E2 (id2 , t2 , v1 , . . . , vm , g) ∈ D2 , t1 ≤ t2 } 2 and Σ0 = (RE1 ∪RE2 , RΦ1 ∪RΦ2 ∪{FEE21 [ID1 , Time1 , Φ, ID2 , Time2 ]})FType Type1 . 2 FRegion Province

ID1 P6 P7 P8 P2 P3 P4 P9 P 10

Year1 1976 1976 1976 1976 1976 1976 1976 1976

Φ Φd Φd Φd Φd Φd Φd Φd Φd

ID2 R1 R1 R1 R2 R2 R2 R2 R2

Year2 1976 1976 1976 1976 1976 1976 1976 1976

Figure 6: Filiation instance D for a join operation

Example 3. (cont. example 2) In addition to the spatio-temporal database M1 of example 2, consider a database M2 = hΣ2 , D2 i, with instance D2 shown in Figure 5. The database M2 represents part of the evolution of provinces in Chile from 1925 to 2007. Σ2 has the same time domain T than Σ1 and Φ = {ρd }. Tables in Σ2 are: (i) An entity table Province[ID, Year, Name, G] for provinces, and a filiation table FProvince Province [ID1 , Year1 , Φ, ID2 , Year2 ] for derivation relations between provinces. In addition, consider a filiation table FRegion Province [ID1 , Year1 , Φ, ID2 , Year2 ], Region such that FProvince [ID1 , Year1 ] ⊆ Province[ID, Year] and FRegion Province [ID2 , Year2 ] ⊆ Region[ID, Year] with instance D in Figure 6. The join of the database whose instance is shown in Figure 7 is denoted by M1 1M2 (FRegion Province , D). The schema of the union is now composed of two entity tables, Region[ID, Year, Name, G] and Province[ID, Year, Name, G], and three filProvince iation tables, FRegion Region [ID1 , Year1 , Φ, ID2 , Year2 ], FProvince [ID1 , Year1 , Φ, ID2 , Year2 ], and FRegion Province [ID1 , Year1 , Φ, ID2 , Year2 ]. Notice that by handling topological relations implicitly in the spatio-temporal database, the join of M1 and M2 allows the extraction of topological relations between regions and provinces. 2 The spatio-temporal database model is essentially a representation of a graph. As such, it can be related to other representations for graph models such as the ones used for the RDF model. Even more, existing query languages for graphs and strategies for query processing on graphs can be adapted to this model. 17

ρc

P6

ρd TPPi ρd

P1

EQ

ρc

EC

TPPi

EQ

P7

TPPi

ρc

EC

ρd

P8

PPi

ρd ρd

PPi PPi

TPPi TPPi ρd

P6 EC

P7 EC

EC

P8 EC

P11

ρd

PPi

ρd TPPi

R1

PPi

PPi

PPi

R3

TPPi ρc

EC

R1 ρc TPPi

R2

R2

TPPi EC

ρd ρd ρc

P2 EC

P3 EC

EQ

ρd

PPi

TPPi ρc

P5

ρd

P2

PPi EC

TPPi

R4

PPi PPi

ρd

PPi EC

EQ

ρd

ρd TPPi

P9 ρd

TPPi ρc

P3 TPPi ρd

PPi

ρc

P2

PPi

P12 EC

P9 EQ ρc

EC

P10

EC

P3 EQ EC

ρc

P10

P4

1925

ρc

ρc

TPPi

EQ

P4

1976

P4

2006

Figure 7: Graph join between provinces and regions

18

PPi

EC

PPi PPi

4. Consistency of spatio-temporal databases This section introduces a formalization of integrity constraints for spatiotemporal databases. It also analyzes the satisfiability problem of a set of integrity constraints and provides algorithms for checking consistency of an instance of a spatio-temporal database. 4.1. Integrity constraints Defining integrity constraints is essential to ensure that a spatio-temporal graph is consistent with respect to the reality that is being modeled. These constraints capture part of the semantics that should be embedded in the model. This section specifies constraints over spatio-temporal databases. It classifies the constraints into model and semantic constraints. Model constraints refer to those that should be satisfied by every spatio-temporal database. On the other hand, semantic constraints are application dependent and impose conditions over the filiation relations. Model constraints [MC]. In spatio-temporal databases, there exists a number of domain constraints for the geometric representation of spatial objects [27]. In this work, the specification of these constraints is omitted because they are semantically independent of the spatio-temporal database. The following constraints enforce that (i) for all entity table, attributes ID and Time are their primary key; (ii) for every filiation table FEE21 , there is an foreign key between the table and the corresponding entity tables E1 and E2 ; and (iii) filiation relations can only exist between different entities existing with a temporal order ≤. In order to simplify the presentation, u¯ is used as a shorthand for u1 , . . . , un , and u¯ = v¯ is a shorthand for u1 = v1 , . . . un = vn . In both cases, n is the arity of the corresponding attribute in the table. Definition 8 (MC). Given a spatio-temporal database M = hΣ, Di with Σ = {RE , RΦ }, the following model constraints (MC) should be satisfied by D: 1. For every E[ID, Time, U1 , . . . , Un , G] ∈ RE , an instance D should satisfy the following primary key constraint (PK): ¯ ∀(E(id ¯, g1 )∧E(id1 , t1 , v¯, g2 ) → (¯ u = v¯) ∧EQ(g1 , g2 )). 1 , t1 , u

(PK)

2. For every filiation table FEE21 [ID1 , Time1 , Φ, ID2 , Time2 ] and entity tables E1 [ID, Time, U1 , . . . , Un , G] and E2 [ID, Time, V1 , . . . , Vm , G], an instance D should satisfy the following foreign key (FK): uv¯g¯(E1 (id 1 , t1 , u ¯, g1 )∧E2 (id2 , t2 , v¯, g2 )). ∀id ρ t¯ (FEE21 (id 1 , t1 , ρ, id2 , t2 ) → ∃¯

19

( FK)

3. For every FEE21 [ID1 , Time1 , Φ, ID2 , Time2 ] ∈ RΦ , an instance D should satisfy the following filiation constraint (FC): ¯ E2 (id1 , t1 , ρ, id2 , t2 ) → (id1 6= id2 ) ∧ (t1 ≤ t2 )). ∀(F (FC) E1 2 Spatio-temporal semantic constraints [STSC]. A spatio-temporal database leaves the topological relations implicit, since they can be computed on the fly using the geometries of each entity. Though, this section concentrates on semantic constraints for the filiation relation. So far, we have restricted ourselves only to continuation relations (ρc ) and derivation relations (ρd ). However, there are several sub-types of filiation relations. For example, common filiation relations in the geographic domain, besides continuation and derivation, are: expansion, contraction, splitting, merging, separation and annexation [17, 6]. As it will be seen by the definitions of these relationships below, the first two can be derived by the data in the entity tables and are, therefore, not explicitly represented in any filiation tables. Given entities E1 (id1 , t1 , u¯1 , g1 ), E2 (id2 , t2 , u¯2 , g2 ), and E3 (id3 , t3 , u¯3 , g3 ), with t1 ≤ t2 , these filiation relations can be defined as follows: • E2 (id2 , t2 , u¯2 , g2 ) is an expansion of E1 (id1 , t1 , u¯1 , g1 ) if E1 = E2 , id1 = id2 and PP(g1 , g2 ). An entity continues existing but its geometry expands. • E2 (id2 , t2 , u¯2 , g2 ) is a contraction of E1 (id1 , t1 , u¯1 , g1 ) if E1 = E2 , id1 = id2 and PPi(g1 , g2 ). An entity continues existing but its geometry contracts. • E1 (id1 , t1 , u¯1 , g1 ) splits into E2 (id2 , t2 , u¯2 , g2 ) and E3 (id3 , t3 , u¯3 , g3 ) if id1 6= id2 , id1 6= id3 , id2 6= id3 , t2 = t3 , and EQ(g1 , GeomUnion({g2 , g3 })), where GeomUnion is the geometric union of a set of geometries. Another requirement is that the entity with id1 ceases to exist after the split and that entities with id2 and id3 never existed before. If E1 (id1 , t1 , u¯1 , g1 ) splits into E2 (id2 , t2 , u¯2 , g2 ) and E3 (id3 , t3 , u¯3 , g3 ), then (id1 , t1 , ρsp , id2 , t2 ) ∈ FEE21 and (id1 , t1 , ρsp , id3 , t3 ) ∈ FEE31 . • E2 (id2 , t2 , u¯2 , g2 ) is a separation of E1 (id1 , t2 , u¯01 , g10 ) with respect to E1 (id1 , t1 , u¯1 , g1 ) if id1 6= id2 and EQ(g1 , GeomUnion({g2 , g10 })), and entity with id2 is a new entity. This filiation relation is represented in the spatio-temporal database by (id1 , t1 , ρse , id2 , t2 ) ∈ FEE21 . • E1 (id1 , t1 , u¯1 , g1 ) and E2 (id2 , t2 , u¯2 , g2 ) merge into E3 (id3 , t3 , u¯3 , g3 ) if id1 6= id2 , id1 6= id3 , id2 6= id3 , t1 = t2 and EQ(g3 , GeomUnion({g1 , g2 })). Another requirement is that the merged entities cease to exist and the new entity formed by the merge did not exist before. This filiation relation is represented in the spatiotemporal database by (id1 , t1 , ρme , id3 , t3 ) ∈ FEE31 and (id2 , t2 , ρme , id3 , t3 ) ∈ FEE32 . 20

• The annexation of E1 (id1 , t1 , u¯1 , g1 ) to E2 (id2 , t1 , u¯2 , g2 ) results into an annexed entity E2 (id2 , t2 , u¯02 , g20 ) if id1 6= id2 and EQ(g20 , GeomUnion({g1 , g2 })). Also, the entity with id1 ceases to exist. This filiation relation is represented in the spatio-temporal database by (id1 , t1 , ρan , id2 , t2 ) ∈ FEE21 . Note that the definitions of splitting and merging can be extended to consider several entities. It is possible to split an entity into several entities, or to merge several entities into one. Since the semantics of expansion and contraction can be derived from spatial semantic constraints as those defined in [3], we concentrate on the definition of constraints to enforce the conditions that the splitting, separation, annexation, and merging relations should have. The following examples show constraints that have these filiation relations. They assume that Φ contains the splitting (ρsp ), separation (ρse ), merging (ρme ), and annexation (ρan ) relations and that the entities involved are of the same type E. TPPi

id2

ρsp id1

id2

ρsp

id3

id1

id3 TPPi

t1

t2

Figure 8: Splitting of regions

Example 4 (Splitting). Consider the spatio-temporal database in Figure 8. In order to check the properties of the splitting of entity with id1 , we need the following constraints: sp1 sp2 sp3

0 ¯0 , g 0 )∧ FE (id , t , ρ , id , t ) → (t0 ≤ t )) ¯ : ∀(E(id 1, t , u 1 1 sp 2 2 1 E 0 ¯0 , g 0 )∧ FE (id , t , ρ , id , t ) → (t0 ≥ t )) ¯ : ∀(E(id 2, t , u 1 1 sp 2 2 2 E ¯ : ∀(E(id1 , t1 , u ¯1 , g1 ) ∧ E(id2 , t2 , u ¯2 , g2 ) ∧ E(id3 , t2 , u ¯3 , g3 ) ∧ id2 6= id3 ∧ FEE (id1 , t1 , ρsp , id2 , t2 )∧ FEE (id1 , t1 , ρsp , id3 , t2 ) → EQ(g1 , GeomUnion({g2 , g3 }))

Constraint sp1 ensures that after splitting, the initial object ceases to exist, since there is not other tuple in the entity table E associated with the same entity in a time later than t1 ; constraint sp2 ensures that the splitting creates new entities, since there were no tuples in E before t2 associated with the entities resulting from this derivation; and constraint sp3 checks that the geometry of the original object is equal to the union of the geometries of the two new entities. 2

21

TPPi

id1

ρc id1

id1

ρse

id2

id1

id2 TPPi

t1

t2

Figure 9: Separation of regions

Example 5 (Separation). Consider the spatio-temporal database in Figure 9. In order to check the properties of the separation of the entity with id1 , the following constraints are needed: se1 se2

0 ¯0 , g 0 )∧ FE (id , t , ρ , id , t ) → (t0 ≥ t )) ¯ : ∀(E(id 1 1 se 2 2 1 2, t , u E ¯ : ∀E(id1 , t1 , u ¯1 , g1 )∧E(id1 , t2 , u ¯3 , g2 )∧E(id2 , t2 , u ¯2 , g3 )∧ FEE (id1 , t1 , ρse , id2 , t2 ) → EQ(g1 , GeomUnion({g2 , g3 }))

Constraint se1 ensures the entity that is created by the separation is indeed a new entity; and constraint se2 ensures that the original geometry of id1 is equal to the geometric union of geometries of id1 and id2 after id2 separates from id1 . 2 Example 6 (Merging). Consider the spatio-temporal database in Figure 10. In order to check the properties of the merge of the entity with id1 with entity of id2 , the following constraints are needed: 0 ¯0 , g 0 )∧ FE (id , t , ρ , id , t ) → (t0 ≤ t )) ¯ me1 : ∀(E(id 1 1 me 2 2 1 1, t , u E 0 ¯0 , g 0 )∧ FE (id , t , ρ , id , t ) → (t0 ≥ t )) ¯ me2 : ∀(E(id 1 1 me 2 2 2 2, t , u E ¯ me3 : ∀(E(id1 , t1 , u¯1 , g1 ) ∧ E(id2 , t1 , u¯2 , g2 ) ∧ E(id3 , t2 , u¯3 , g3 ) ∧ id1 6= id2 FEE (id1 , t1 , ρme , id3 , t2 )∧ FEE (id2 , t1 , ρme , id3 , t2 ) → EQ(g3 , GeomUnion({g1 , g2 }))

Constraint me1 ensures that after merging, the original entities cease to exist; constraint me2 ensures that the entity after merging is a new one; and constraint me3 ensures that the geometry of the new entity is equal to the union of geometries of the two entities from it is derived. 2 Example 7 (Annexation). Consider the spatio-temporal database in Figure 11. In order to check the properties of the annexation between entities with id1 and id2 , we need the following constraints: an1 an2

0 ¯0 , g 0 )∧ FE (id , t , ρ , id , t ) → (t0 ≤ t )) ¯ : ∀(E(id 1, t , u 1 1 an 2 2 1 E ¯ : ∀(E(id1 , t1 , u ¯1 , g1 ) ∧ E(id2 , t1 , u ¯2 , g2 ) ∧ E(id2 , t2 , u¯3 , g3 ) ∧ FEE (id1 , t1 , ρan , id2 , t2 ) → EQ(g3 , GeomUnion({g1 , g2 }))

Constraint an1 ensures that the entity that is annexed ceases to exist; and constraint 22

id1

TPP

ρme id1

id3

ρme

id2

id2

id3 TPP

t1

t2

Figure 10: Merging of regions

id1

TPP

ρan id1

id2

ρc

id2

id2

id2 TPP

t1

t2

Figure 11: Annexation of regions

an2 ensures that the resulting geometry of id2 is equal to the geometric union of the geometries of id1 and id2 before id2 is annexed to id1 . 2 The definition of general semantic filiation constraints that are able to handle these types of relations but also more general or specific ones follows. Definition 9 (STSC). Given a set U of attribute values, a time domain T , a set of topological relations Γ, a set Φ = {ρsp ,ρse ,ρan ,ρme } of filiation relations, a set Θ of geometric operators that return a geometry, and a spatio-temporal database schema Σ = (RE , RΦ ), we define the following spatio-temporal semantic constraints (STSC) over it: 1. Identity-Existence Dependency (IED): Identity-filiation dependencies constrain the existence of an entity to its participation in a derivation relation. We consider that if an entity is derived from another entity, then it may not be also a continuation and that, analogously, if an entity derives another entity, then it may not continue existing. These constraints are formalized as follows: ¯ 2 (id2 , t3 , u ∀(E ¯3 , g3 ) ∧ FEE21 (id1 , t1 , ρf , id2 , t2 ) → (t3 ≥ t2 ))

(IED-1)

¯ 1 (id1 , t3 , u¯3 , g3 ) ∧ FE2 (id1 , t1 , ρf , id2 , t2 ) → (t3 ≤ t1 )) ∀(E E1

(IED-2)

where E1 and E2 are not necessarily different entity tables in RE ; filiation table FEE21 ∈ RΦ , and ρf ∈ Φ. 23

2. Topological-Filiation Dependency (TFD): Topological-filiation dependencies constrain the topological relation between the geometries of entities that participate in different types of derivation relations. Formally, V V ¯ Vn Ei (idi , ti , u ¯i , gi )∧ j,k∈[1,n] FEEkj (idj , tj , ρjk ∀( i=1 f , idk , tk )∧ l,m∈[1,n] (idl 6= idm ) → T (θ1 (¯ g1 ), θ2 (¯ g2 )))

where Ei ∈ RE for all i ∈ [1, n]; FEEkj ∈ RΦ for all j, k ∈ [1, n], T ∈ Γ, ρjk f ∈ Φ, θ ∈ Θ and both g¯1 and g¯2 are a sequence of geometries in {g1 , . . . , gn }. Note that the definition does not require that all constraints have all filiation relations nor inequalities, but just a subset of them. 2 These constraints allow us to represent splitting, separation, merging, and annexation relations. Indeed, every constraint given in Examples 4, 5, 6 and 7 can be expressed as either an IED or a TFD. For example, consider the case of splitting (Example 4). In such example, the TFD is defined over three tuples of the same entity table E, the topological relation T is EQ, θ1 = θ2 = GeomUnion(·), g¯1 = {g1 }, and g¯2 = {g2 , g3 }. Despite the more complex expression for the general form of a TFD with respect to the previous examples, these constraints can formalize several other types of filiation relations. There exist other types of semantic constraints that do not necessarily describe filiation relations. In particular, it is possible to consider topological dependency constraints that impose topological conditions to the semantics of entities [3]. For example, land parcels can be disjoint or touch, but not be internally connected. In this paper, however, we concentrate on constraints that are particular to the spatio-temporal database. A spatio-temporal database M = hΣ, Di satisfies a set of integrity constraints ψ defined over Σ, denoted D |= ψ, if every ϕ ∈ ψ is satisfied by D. 4.2. Satisfiability of integrity constraints Satisfiability is a classical reasoning problem that applies to integrity constraints. Intuitively, the satisfiability problem consists in determining if the set of constraints does not have internal contradictions. In classical databases, a set of constraints is said to be satisfiable if there exists a non-empty instance that satisfies the constraints. This definition can be easily modified for spatio-temporal databases. Definition 10. Given a spatio-temporal schema Σ, a set of integrity constraints ψ defined over it is satisfiable if there exists a spatio-temporal database M = hΣ, Di such that D |= ψ and D 6= ∅. 24

Proposition 1. Given a spatio-temporal schema Σ, a set of model and spatiotemporal integrity constraints ψ is always satisfiable. 2 Proof: Given a schema Σ = (RE , RΦ ) and a set of constraints ψ, there always exists an instance De such that De |= ψ and De 6= ∅. Let De contains only one tuple in any entity table E[ID, Type, U1 , . . . , Un , G] ∈ RE . It is trivial to check that De |= ψ. 2 This notion of satisfiability corresponds to the classical definition for integrity constraints in databases. For spatio-temporal databases, let us consider a stronger requirement that checks if there exists a database with at least one tuple in an entity table and one in a filiation table. Definition 11. Given a spatio-temporal schema Σ, a set of integrity constraints ψ defined over it is strongly satisfiable if there exists a spatio-temporal database M = hΣ, Di such that: (i) D |= ψ; (ii) there exists E[ID, Type, U1 , . . . , Un , G] ∈ RE such that D contains a tuple in the entity table E; and (iii) there exists FEE21 [ID1 , Type1 , Φ, ID2 , Type2 ] ∈ RΦ such that D contains a tuple in the filiation table FEE21 . 2 As the following example shows, not all sets of MCs and STSCs are strongly satisfiable. Example 8. Consider a schema Σ = (RE , RΦ ) with RE = {E1 [ID, Type, U1 , . . . , Un , G], E2 [ID, Type, U1 , . . . , Um , G]} and RΦ = {FEE21 [ID1 , Type1 , Φ, ID2 , Type2 ]} and the following set of constraints ψ: ¯ 1 (id1 , t1 , u ∀(E ¯, g1 )∧E1 (id1 , t1 , v¯, g2 ) → (¯ u = v¯) ∧(g1 = g2 )) ¯ ∀(E2 (id1 , t1 , u ¯, g1 )∧E2 (id1 , t1 , v¯, g2 ) → (¯ u = v¯) ∧(g1 = g2 ))

(1)

uv¯g¯(E1 (id 1 , t1 , u ¯, g1 )∧E2 (id2 , t2 , v¯, g2 )) ∀id ρ t¯ (FEE21 (id 1 , t1 , ρ, id2 , t2 ) → ∃¯ E ¯ 2 (id1 , t1 , ρ, id2 , t2 ) → (id1 6= id2 ) ∧ (t1 ≤ t2 )) ∀(F

(3)

E1

¯ 1 (id1 , t1 , u ∀(E ¯1 , g1 ) ∧ E2 (id2 , t2 , u ¯2 , g2 ) ∧ ¯ 1 (id1 , t1 , u ∀(E ¯1 , g1 ) ∧ E2 (id2 , t2 , u ¯2 , g2 ) ∧

FEE21 (id1 , t1 , ρf , id2 , t2 ) FEE21 (id1 , t1 , ρf , id2 , t2 )

(2)

(4)

→ EQ(g1 , g2 )) (5) → DC(g1 , g2 )) (6)

Constraints (1-4) correspond to MCs and (5-6) to STSCs with ρf ∈ Φ. The set of constraints is strongly satisfiable if there exists a database instance, say De , with at least one tuple in FEE21 and either one in E1 25

or E2 . If a tuple FEE21 (id1 , t1 , ρf , id2 , t2 ) ∈ De , by constraint (3) tuples {E1 (id1 , t1 , u1 , . . . , un , g1 ), E2 (id2 , t2 , v1 , . . . , vm , g2 )} ∈ De . So far, {FEE21 (id1 , t1 , ρf , id2 , t2 ), E1 (id1 , t1 , u1 , . . . , un , g1 ), E2 (id2 , t2 , v1 , . . . , vm , g2 )} ∈ De , which triggers constraints (5-6). However, it is not possible for g1 and g2 to be both equal and disjoint. Though the set ψ of constraints is not strongly satisfiable. 2 Strong satisfiability involves checking spatial relationships between geometries which requires checking the consistency of topological relations as expressed by the composition of topological relations [12]. Even more, this relates to the problem of satisfiability of a set of topological constraints [19]. There is a work that studies the satisfiability of constraints involving topological relations [3], which shows that satisfiability of topological constraints is in general a hard problem; however, there exist subsets of constraints for which checking consistency can be done in polynomial time. 4.3. Checking consistency of a spatio-temporal database The implementation of a spatio-temporal database using an extendedrelational database (DB) needs to check the satisfaction of MCs and STSCs. Some of these constraints, such as the primary and foreign key model constraints, can already be enforced by database management systems, since they correspond to classical primary and foreign keys of relational databases. Therefore, this section concentrates on techniques for checking MCs of type FC and STSCs. Naive approach. A technique to check the satisfaction of constraints is to generate queries that return the inconsistent tuples. If the answer to that query is empty, it implies that the database instance satisfies the constraints. For example, given a schema Σ = (RE , RΦ ) such that FEE21 [ID1 , Time1 , Φ, ID2 , Time2 ] ∈ RΦ and a ¯ E2 (id1 , t1 , ρ, id2 , t2 ) → (id1 6= id2 ) ∧ (t1 ≤ t2 )), a filiation constraint ϕ = ∀(F E1 SQL query to check if ϕ is satisfied would be: QFC (ϕ) :

SELECT FROM

∗ FEE2

WHERE

(FEE2 .ID1 = FEE2 .ID2 OR FEE2 .Time1 > FEE2 .Time2 ).

1

1

1

1

1

If QFC (ϕ) is empty, this means that the database instance satisfies constraint ϕ. However, if there is a tuple (id1 , t1 , ρ, id2 , t2 ) in the answers to QFC (ϕ), this implies that ϕ is not satisfied and that either id1 = id2 or t1 > t2 . We refer to QFC (ϕ) as a check-query for ϕ. A naive way of checking constraints would be to generate one check-query for each type of constraints. If all the queries return an empty answer, then the 26

database satisfies the constraints. However, this approach does not take advantage of the fact that similar queries can be combined. Optimized approach. For optimization purposes, it is possible to define checkqueries that determine the satisfaction of several constraints at the same time or that aggregate several conditions that trigger a constraint. As it is shown in the experimental section, this combination increases the efficiency of checking whether or not a database satisfies a set of constraints. For IEDs, a simple combination of constraints groups constraints over the same entity and filiation tables and the same temporal comparison, but over different values for derivation relations. Consider FEE21 [ID1 , Time1 , Φ, ID2 , Time2 ] ∈ RΦ , E2 [ID, Time, U1 , . . . , Un , G] ∈ RE and a set ψ of IEDs of the form (IED-1) defined over them. Let the set {ρ1 , . . . , ρm } be the values for attribute Φ in the constraints in ψ. Then, a SQL check-query for all (IED-1) constraints in ψ is: QIED-1 (ψ):

SELECT FROM WHERE

∗ FEE2 , E2 1 ((Φ = ρ1 OR . . . OR Φ = ρm ) AND FEE2 .ID2 = E2 .ID AND FEE2 .Time2 > E2 .Time) 1

1

A very similar check-query is possible for an IED-2. For IED-2, the check-query replaces E2 by E1 in the FROM and WHERE statements and the FEE21 .Time2 > E2 .Time by FEE21 .Time1 < E1 .Time. Due to the generality of the TDFs, the optimization of this type of checkqueries is more complex and depends on the operators θ1 and θ2 in the constraints. When θ1 and θ2 are aggregation operators grouped by entities that participate in the derivation relations, then it is possible to avoid join queries of the naive approach that needs to combine tuples of entity tables referenced by tuples of the filiation tables of interest. The basic idea is to group geometries for which aggregation operators θ1 and θ2 must be applied. In what follows, checking queries are specified for particular TFDs that use the spatial aggregation GeomUnion for θ1 or θ2 grouped by the entity in the filiation table that derives (or is derived by) other entities. We though restrict to two subtypes of TFDs that allow the representation of merging, splitting, annexation and separation of the same or different types, and where an entity can split into several entities or several entities merge into one. More formally, let us consider TFDs of the following form: V V ¯ 1 (id1 , t1 , u ∀(E ¯1 , g1 )∧

n ¯i , gi ) 2 (idi , ti , u i=2 EV

E2 k∈[2,n] FE1 (id1 , t1 , ρf , idk , tk )∧

l,m∈[1,n] (idl 6= idm )

27

→ T (g1 , GeomUnion(¯ g ))) (TFD-1)

V V ¯ 1 (id1 , t1 , u ∀(E ¯1 , g1 )∧ ni=2 E2 (idi , ti , u ¯i , gi ) k∈[2,n] FEE12 (idk , tk , ρf , id1 , t1 )∧ V g ))) (TFD-2) l,m∈[1,n] (idl 6= idm ) → T (g1 , GeomUnion(¯

where {E1 , E2 } ⊆ RE ; {FEE21 , FEE12 } ⊆ RΦ , T ∈ Γ, ρf ∈ Φ, and g¯ is a sequence of geometries in {g2 , . . . , gn }. Notice that we consider θ1 to be the identity and, therefore, we replaced θ1 ({g1 }) by g1 . Now, let us consider FEE21 [ID1 , Time1 , Φ, ID2 , Time2 ] ∈ RΦ , E1 [ID, Time, U1 , . . . , Un , G1 ], E2 [ID, Time, V1 , . . . Vn , G2 ] ∈ RE and a set ψ of TDF-1 constraints. Then, a Spatial SQL check-query for constraints of the form (TDF-1) in ψ is: QTFD-1 (ψ):

SELECT FROM

WHERE

∗ E1 , (SELECT ID1 , Time1 , GeomUnion(G2 ) FROM FEE2 , E2 1 WHERE Φ = ρf AND ID = ID2 AND Time = Time2 GROUP BY ID1 , Time1 ) E(ID, Time, G) E1.ID = E.ID AND E1.Time = E.Time AND EQUALS(E1.G1 , E.G);

In this check-query, E(ID, Time, G) is a temporal table created by the aggregation of geometries of derived entities, which are grouped by the entity which derives them. Also, EQUALS is the name of the topological predicate EQ in current Spatial SQLs. Note that this check-query is defined for a variable number of derived entities, this is, it can represent the split of a geometry into two, three or more geometries, which in the naive approach would require to define a query for each different number of derived entities. A similar SQL check-query QTFD-2 (ψ) can be defined for a set of TFD-2 constraints but where we aggregate the entities that derive grouped by the entity they derive. For other types of TFDs, and like the case of IEDs, it is possible to combine TDFs of the same form, but with different derivation relations, into one check-query. 5. Consistency checking experimental evaluation This section develops an experimental evaluation that illustrates the potential of the modeling approach and test the performance of the queries to check consistency of a database instance. We will compare the naive, where there is one SQL query for each constraint, and the optimized approach described in Section 4.3, where more than one constraint is checked by a single SQL query. The whole approach is experimented on top of a cadastral application of the Canton de Neufchatel in Switzerland that represents a series of time instances of that data set. This data set is composed by seven snapshots (time instants) describing the 28

land parcels in the region from 2003 to 2012. Figure 12 displays the parcel’s geometries for the first snapshot of the data set. It also shows the number of parcels existing at each time instant. The experiment requires to generate a spatio-temporal database as described in Section 3.1 and a set of constraints that are appropriate for the given cadastral information. Since the original data has no explicit filiation relations, it is necessary to artificially add some relations by considering some particular relationships between geometries. The following sections describe how the databases are generated and what constraints are appropriate for the context of cadastral information of parcels. Finally, the results of our experiments is presented.

Time Instant 1 2 3 4 5 6 7

(a)

N◦ of Parcels 5939 5950 5943 5943 5968 5982 5986

(b)

Figure 12: (a) Geographical data visualization at time instant 1 and (b) Number of parcels per time instance

5.1. Spatio-temporal database generation In order to represent the data set, the spatio-temporal schema needs an entity table Parcel to store the parcel information and a filiation table FParcel Parcel to store the filiation relationships between them. Namely, we consider a schema Σparcel = (RE , RΦ ) with RE = {Parcel[ID, Time, G]} and RΦ = {FParcel Parcel [ID1 , Time1 , Φ, ID2 , Time2 ]}. To simplify the presentation, we refer to Parcel and FParcel Parcel simply P by P and FP , respectively. 29

To generate a database instance for schema Σparcel , we need to take the data from the cadastral snapshots and modify it so that it complies with the requirements of a spatio-temporal database. This is done by making some assumptions and applying certain techniques to generate database instances with different levels of inconsistency to test the performance of the check-queries. All these database instances share the same P table but have different filiation relations in table FPP . The entity table P was populated making the following considerations: (i) The attributes for Time and G were taken directly from the given snapshots.(ii) Adding the values for attribute ID was not straightforward, since the snapshots used different identifiers for all parcels at every time instance. To create attribute ID for P, parcels in consecutive time instants with the same geometry hold a continuation relation and, therefore, had the same value of ID. (iii) If there are parcels in a time instant that did not exist at the previous time instant then, they were inserted as new tuples into P with a novel identifier that was not used in previous time instants. The filiation table FPP cannot be extracted from the cadastral data set, since there were no explicit filiation relations between the snapshots of the original data. So, we created derivation relations that can be reasonably deduced considering the properties of the geometry of parcels in consecutive time instance and a parameter α ∈ (0, 1), which allowed us to generate different number of filiation relations. For every α, we denote by FPP (α) the instance obtained with that parameter, and by Dα the corresponding database instance that contains the instance for table P and FPP (α). Given a α ∈ (0, 1), the filiation table FPP (α) is populated by applying the following rules for every pair P(idi , ti , gi ) and P(idj , ti+1 , gj ) in the entity table P: 1. If PPi(gi , gj ) and not EQ(gi , gj ), then (idi , ti , ρsp , idj , ti+1 ) ∈ FPP . 2. If PO(gi , gj ), Area(gi ) > Area(gj ) and Area(Intersection(gi , gj )) > α × Area(gj ) then (idi , ti , ρse , idj , ti+1 ) ∈ FPP . 3. If PP(gi , gj ) and not EQ(gi , gj ) then (idi , ti , ρme , idj , ti+1 ) ∈ FPP . 4. If PO(gi , gj ) and Area(gj ) > Area(gi ) and Area(Intersection(gi , gj )) > α × Area(gi ) then (idi , ti , ρan , idj , ti+1 ) ∈ FPP . Note that these previous rules are not sufficient to guarantee that the derivation relations correspond exactly to splitting, separation, merging, and annexation. However, they provide us a strategy to define derivation relations to make an ex30

perimental evaluation of check-queries. Parameter α in these rules represents a threshold (between 0 and 1) of the overlapping area to consider that there is a relation ρse and ρan . Different values of α produce different numbers of filiation relationships between the different entities. Table 2 describes the properties of the spatio-temporal database instances D0.5 , D0.7 and D0.9 generated with α = 0.5, α = 0.7 and α = 0.9, respectively. Property Identities (IDs) ρc relations ρsp relations ρse relations ρme relations ρan relations Size (Mb)

D0.5 7542 5949 151 774 167 543 30.1

D0.7 7542 5949 151 770 167 534 30.1

D0.9 7542 5949 151 754 167 520 30.1

Table 2: Spatio-temporal database properties generated for different values of α

5.2. Integrity constraints Let us define a set of semantic integrity constraints that are of interest in the context of the given cadastral dataset. Let Ψparcel contain: 1. Identity-Existence Dependency (IED): ϕ1 : ϕ2 : ϕ3 : ϕ4 :

P ¯ ∀(P(i 1 , t3 , g2 ) ∧ FP (i1 , t1 , ρsp , i2 , t2 ) → (t3 ≤ t1 )) P ¯ ∀(P(i 1 , t3 , g2 ) ∧ FP (i1 , t1 , ρme , i2 , t2 ) → (t3 ≤ t1 )) P ¯ ∀(P(i 2 , t3 , g2 ) ∧ FP (i1 , t1 , ρsp , i2 , t2 ) → (t3 ≥ t2 )) P ¯ ∀(P(i 2 , t3 , g2 ) ∧ FP (i1 , t1 , ρme , i2 , t2 ) → (t3 ≥ t2 ))

IEDs ϕ1 to ϕ4 ensure that the entity that splits or the entities that merge cease to exist and that entities that are derived did not exist before. Notice that, as it is done in the examples 5 and 7, we do not impose any EID for derivations ρse and ρan , since the way the database instance is constructed guarantees that these constraints are satisfied. 2. Topological-Filiation Dependency (TFD): P P ¯ ϕ5 : ∀(P(i 1 , t1 , g1 ) ∧ P(i2 , t2 , g2 ) ∧ P(i3 , t2 , g3 ) ∧ FP (i1 , t1 , ρse , i2 , t2 ) ∧ FP (i1 , t1 , ρse , i3 , t2 )∧ i2 6= i3 → Touches(g2 , g3 )) P P ¯ ϕ6 : ∀(P(i 1 , t1 , g1 ) ∧ P(i2 , t1 , g2 ) ∧ P(i3 , t2 , g3 ) ∧ FP (i1 , t1 , ρan , i3 , t2 ) ∧ FP (i2 , t1 , ρan , i3 , t2 )∧ i1 6= i2 → Touches(g1 , g2 )) ¯ ϕ7 : ∀(P(i 1 , t1 , g1 ) ∧ P(i2 , t2 , g2 ) ∧ P(i3 , t2 , g3 ) ∧ P(i4 , t2 , g4 ) ∧ P(i5 , t2 , g5 ) ∧ P(i6 , t2 , g6 )∧ FPP (i1 , t1 , ρsp , i2 , t2 ) ∧ FPP (i1 , t1 , ρsp , i3 , t2 ) ∧ FPP (i1 , t1 , ρsp , i4 , t2 ) ∧ FPP (i1 , t1 , ρsp , i5 , t2 )∧ FPP (i1 , t1 , ρsp , i6 , t2 ) → Equals(g1 , GeomUnion(g2 , g3 , g4 , g5 , g6 )))

31

¯ ϕ8 : ∀(P(i 1 , t1 , g1 ) ∧ P(i2 , t1 , g2 ) ∧ P(i3 , t1 , g3 ) ∧ P(i4 , t1 , g4 ) ∧ P(i5 , t1 , g5 ) ∧ P(i6 , t2 , g6 )∧ FPP (i1 , t1 , ρme , i6 , t2 ) ∧ FPP (i2 , t1 , ρme , i6 , t2 ) ∧ FPP (i3 , t1 , ρme , i6 , t2 ) ∧ FPP (i4 , t1 , ρme , i6 , t2 )∧ FPP (i5 , t1 , ρme , i6 , t2 ) → Equals(g6 , GeomUnion(g1 , g2 , g3 , g4 , g5 )))

TFDs ϕ5 and ϕ6 have been introduced as examples of possible constraints that may rise due to the way we define separation and annexation in our dataset. TFD ϕ5 states that if an entity separates into other entities, these derived entities should touch. TFD ϕ6 states that if entities are annexed to a same entity, they should touch. Both constraints impose some kind of closeness to separation and annexation and are helpful to illustrate other forms of TFD. Finally, constraints ϕ7 and ϕ8 are the same constraint of examples 4 and 6 of splitting and merging which are adapted to the context of our application. Table 3 shows for each database generated in Section 5.1, the number of inconsistencies found for constraints ϕ3 to ϕ8 . STSC ϕ1 ϕ2 ϕ3 ϕ4 ϕ5 ϕ6 ϕ7 ϕ8

Inconsistencies D0.5 D0.7 D0.9 136 136 136 167 167 167 151 151 151 161 161 161 44 42 32 132 82 28 134 134 134 159 159 159

Table 3: Number of inconsistencies for every Dα and constraint in Ψparcel

5.3. Experimental results The experimental study focuses on the comparison of the running time of checking if a database satisfies a set of constraints using both the naive and optimized approaches defined in Section 4.2. The experiments were run on a server with 2 processors Intel Xeon QUAD core E5620 (2,40 GHZ / 12 MB cache L3) and 32 GB RAM. Notice that constraints ϕ5 and ϕ6 cannot be combined with others constraints and, therefore, the results is the same both for the naive and optimized approaches. For the other constraints, instead, we can optimize check-queries by combination comparison of values of derivation relations for which the same form of EID applies (i.e., combination of constraints ϕ1 and ϕ2 , and of constraints ϕ3 and ϕ4 ) or by using the aggregation over geometries that must satisfy a TFD (constraints ϕ7 and ϕ8 ). 32

Following in Tables 4 and 5, you have the check-queries that use the schema of the database instance. In these check-queries, and to keep consistency in the notation, QTFD-3 (·) and QTFD-4 (·) are types of constraints of the form ϕ5 and ϕ6 , which are specific TFDs particular to this application and were not previously defined. Table 6 shows the time cost for the naive and optimized approaches for checking consistency of database instances D0.5 , D0.7 and D0.9 . These results show that the optimized check-queries outperform naive approaches, which is particularly significant for check-queries of type QTFD-1 (·). QIED-1 ({ϕ1 , ϕ2 }) SELECT o1.id, o1.time FROM P as o1,P as o2, FP P as r1 WHERE (r1.rel = ρsp or r1.rel = ρme ) and r1.id1 = o1.id and r1.t1 = o1.time and o2.id = o1.id and o2.time >= r1.t2; QIED-2 ({ϕ3 , ϕ4 }) SELECT o1.id, o1.time FROM P as o1,P as o2, FP P as r1 WHERE (r1.rel = ρsp or r1.rel = ρme ) and r1.id2 = o1.id and r1.t2 = o1.time and o2.id = o1.id and o2.time

Suggest Documents