An Extended Algebra for Constraint Databases

0 downloads 0 Views 433KB Size Report
rel(r), for a generalized relation r = ft1; :::; tng, are sets ..... De nition 5: Let r = ft1; :::; tng be a generalized rela- ...... to the ESPRIT project IDEA, sponsorized by.
100

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. XX, NO. Y, MONTH 1999

An Extended Algebra for Constraint Databases Alberto Belussi, Elisa Bertino, Barbara Catania

Abstract | Constraint relational databases use constraints to both model and query data. A constraint relation contains a nite set of generalized tuples. Each generalized tuple is represented by a conjunction of constraints on a given logical theory and, depending on the logical theory and the speci c conjunction of constraints, it may possibly represent an in nite set of relational tuples. For their characteristics, constraint databases are well suited to model multidimensional and structured data, like spatial and temporal data. The de nition of an algebra for constraint relational databases is important in order to make constraint databases a practical technology. In this paper, we extend the previously de ned constraint algebra (called generalized relational algebra). First, we show that the relational model is not the only possible semantic reference model for constraint relational databases and we show how constraint relations can be interpreted under the nested relational model. Then, we introduce two distinct classes of constraint algebras, one based on the relational algebra, and one based on the nested relational algebra, and we present an algebra of the latter type. The algebra is proved equivalent to the generalized relational algebra when input relations are modi ed by introducing generalized tuple identi ers. However, it is more suitable from a user point of view. Thus, the di erence existing between such algebras is similar to the di erence existing between the relational algebra and the nested relational algebra, dealing with only one level of nesting. We also show how external functions can be added to the proposed algebra. Keywords | Constraints, generalized relations, relational algebra, nested relational algebra, external functions.

C

I. Introduction

ONSTRAINT programming is a completely declarative paradigm by which computations are described by specifying how they are constrained. The idea of programming with constraints is not new. In Arti cial Intelligence, constraints have been used since many years and several proposals have been developed. The main idea of constraint languages is to state a set of relations (constraints) among a set of objects in a given domain. It is a task of the constraint satisfaction system (or constraint solver) to nd a solution satisfying these relations. Constraint programming is very attractive from a database point of view since it is completely declarative and since often constraints represent the communication language of several high-level applications. During the last few years, a lot of work has been done in order to introduce constraints in both relational [16], [20], [23], [24] and object-oriented databases [8], [21]. In this paper, we only consider relational databases. Constraints can be added to relational database systems at di erent levels. At the data level, they are able to nitely represent possibly in nite sets of relational tuples.

Thus, constraints are a powerful mechanism for modeling spatial and temporal concepts, where often in nite information have to be represented. Indeed, spatial objects can be seen as composed by an in nite set of points, corresponding to the solutions of particular mathematical constraints. For example, the constraint X 2 + Y 2  9 represents a circle with center in the point (0; 0) and with radius equal to 3. From a temporal perspective, constraints are very useful in representing situations that are related to a given period of time. For example, we may think of a train, standing at a transit station each day for the same period of time. With respect to data modeling, the main advantage in using constraints is that they serve as a unifying data type for the (conceptual) representation of heterogeneous data. In particular, the bene t of this approach is emphasized when complex knowledge (for example, spatial or temporal data) has to be combined with some descriptive non-structured information (such as names or gures). At the query language level, constraints increase the expressive power of simple relational languages by allowing mathematical computations. In this respect, constraints have also been used in multimedia database languages, to model both temporal synchronization properties and spatial layout properties for the presentation of multimedia objects, resulting from the query evaluation [29]. The integration of constraints in existing query languages introduces several issues. Indeed, constraint query languages should preserve all the nice features supported by relational languages. For example, they should be closed1 and bottomup evaluable. The rst general design principle to make the integration of constraints and database technology feasible has been proposed in [24], where a general framework for constraint query languages has been de ned. The framework is based on the simple idea that a constraint can be seen as an extension of a relational tuple, or, vice versa, that a relational tuple can be interpreted as a conjunction of equality constraints. The new constraint tuples are called generalized and are represented by nite quanti er-free conjunctions of constraints on a given decidable logical theory. Each generalized tuple nitely represents a possibly in nite set of relational tuples, called extension of the generalized tuple, one for each assignment that makes the generalized tuple true in the domain of the chosen theory. In the same paper, a calculus for constraint databases has been proposed and shown to be tractable from a computational point of view. The obtained model is called generalized relational model. As for the relational model, the correct formalism to obtain both a formal speci cation of the language and a suitable basis to handle implementation is the de nition of an

Alberto Belussi is with the Department of Electronics and Information Science, Polytechnic of Milan, Piazza L. da Vinci 32, 20133 Milano, Italy, e-mail: [email protected]. Elisa Bertino and Barbara Catania are with the Departmentof Com1 Note that algebraic languages are closed by de nition but this is puter Science, University of Milan, Via Comelico 39/41, 20135 Milano, Italy, e-mail: fbertino,[email protected]. not always true for calculus-based languages.

A. BELUSSI ET AL.: AN EXTENDED ALGEBRA FOR CONSTRAINT DATABASES

algebra. In particular, the relational algebra can be easily extended to deal with generalized relations. The obtained algebra is called generalized relational algebra [16], [23], [31]. The main principles underlying the algebraic approach have been discussed in [16], [23]. Motivations: Among the various topics that should be investigated to make constraint databases a practical technology, we believe that there are at least two issues to consider from a modeling point of view:  The basic idea underlying the generalized relational model is to consider a generalized relation as the nite representation of a possibly in nite set of relational tuples [24]. We believe, however, that the relational model is not the only way to assign a semantics to generalized relations. In particular, generalized relations can be interpreted under the nested relational model [1], [2]. Indeed, each generalized relation can be seen as a nite set of (possibly) in nite sets, each representing the extension of a single generalized tuple, contained in the considered relation. Note that, under this interpretation, generalized relations model a very simple kind of nesting. The use of a di erent semantic reference model for constraint databases leads to the de nition of new languages. In this respect, the type of sets that can be modeled by a single generalized tuple becomes important. New connectives, and not only conjunction as proposed in [24], can be used. For example, to model concave sets inside a generalized tuple, disjunction must be used.  The second issue is related to applications. We believe that each application needs some speci c procedures. These procedures may either be not expressible in the constraint language or require a speci c implementation. In both cases, the use of external functions is an important topic. Here, the issues are related to how functions can be added to a constraint language. Contributions: The aim of this paper is to present the nested relational model as a new reference model for generalized relational databases and to investigate algebraic languages based on such model. In particular: 1. We extend the notion of generalized tuple to deal with more general sets of logical operators (for example, containing disjunction). Then, we introduce the nested relational model as a new semantic reference model for generalized relational databases. 2. We show that, in order to take advantage from the use of the nested relational model as semantic reference model, the generalized relational algebra should be extended to deal with operators handling each generalized tuple as a set, thus, as a single object. In particular, we de ne two classes of algebras (r-based and n-based) and we prove several interesting properties relating the two classes. R-based algebras extend the usual relational algebra to constraint databases; n-based algebras are based on the nested relational algebra, de ned for complex objects. A speci c n-based language is also presented and

101

proved equivalent to the generalized relational algebra when speci c generalized tuple identi ers are inserted in input generalized relations. The language contains two classes of operators: one class handling generalized relations as an in nite set of relational tuples (tuple operators), and one class handling each generalized relation as a nite set of sets (set operators). 3. We extend the algebra with new operators dealing with external functions and show that the new operators are well de ned, i.e., they preserve the closure of the algebra. Related work: The algebra we propose introduces set computation in the generalized relational algebra. Thus, it introduces some degree of nesting. Note that this is di erent from the use of constraints representing sets [9], [33]. Indeed, in our framework variables range over numerical domains and not over sets of numbers, as in the approaches cited above. Other languages have been proposed to model complex objects in the generalized relational model [8], [17], [33]. For example, LyriC is a language introducing constraints in an object-oriented framework, thus allowing any kind of nesting [8]. Our proposal di ers from LyriC in that we do not propose a new model. Rather, we assign a new semantics to the already de ned generalized relational model. The equivalence results we present in the paper are similar to the equivalence results presented for the relational algebra and the nested relational algebra [32], [39]. Also in this case, identi ers are needed to represent complex objects in a at fashion. With respect to external functions, it is important to recall that the introduction of external functions in query languages has been considered in several papers [28], [37]. Moreover, some approaches have been proposed to extend constraint query languages with aggregate functions [12], [13], [26]. However, as far as we know, no approach has been proposed to introduce external functions in constraint query languages. Organization of the paper: The paper is organized as follows. The generalized relational model and the generalized relational algebra are presented in Section II and Section III, respectively. Section IV discusses the limitations of the generalized relational model, whereas Section V presents an extended generalized relational model, obtained by extending the notion of generalized tuple. In Section VI we introduce r-based and n-based languages. Then, a n-based language is proposed and proved equivalent to the generalized relational algebra when generalized tuple identi ers are inserted in input relations. Finally, the de nition of operators based on the use of application dependent functions is discussed in Section VII. Section VIII presents some conclusions and outlines future work. II. The generalized relational model

A constraint identi es an atomic formula of a decidable logical theory [11]. Several classes of constraints have been devised; variables can range among elements of a certain domain or among sets of elements of a certain domain [9], [33]. In this paper, we only consider variables ranging over

102

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. XX, NO. Y, MONTH 1999

numerical domains. Some of the possible theories are: real polynomial inequality constraints, dense linear order inequality constraints, equality constraints over an in nite domain [24]. In the following, we assume that each theory  is associated with a speci c structure of interpretation D, having D has domain. To simplify the notation, in the following we use D to denote both the interpretation structure and its domain. For example, real polynomial inequality constraints are all the formulas of the form p(X1 ; :::; Xn)  0, where p is a polynomial with real coef cients in variables X1 ; :::; Xn and  2 f=; 6=; ; g. The domain D is the set of real numbers; function symbols +, , predicate symbols  and constants are interpreted in the standard way over D. The use of constraints to model data is based on the consideration that a relational tuple is a particular type of constraint [24]. For example, the relational tuple (3; 4) for relation R with two real attributes X and Y can be interpreted as the constraint X = 3 ^ Y = 4. Similarly, the formula X < 2 ^ Y > 5 can be interpreted as a generalized tuple, representing the set of relational tuples f(a; b) j a < 2;b > 5;a 2 R;b 2 Rg. Thus, constraints support the nite representation of possibly in nite sets of tuples. The previous notions can be formally stated as follows. De nition 1: [24] Let  be a decidable logical theory.  A generalized tuple t over variables X1 ; :::; Xk in the logical theory  is a nite conjunction '1 ^ ::: ^ 'N , where each 'i , 1  i  N , is a constraint in . The variables in each 'i are among X1 ; :::; Xk. The schema of t, denoted by (t), is the set fX1 ; :::; Xkg.  A generalized relation r of arity k in  is a nite set r = ft1; :::; tM g where each ti, 1  i  M , is a generalized tuple over variables X1 ; :::; Xk and in . The schema of r, denoted by (r), is the set fX1 ; :::; Xkg.  A generalized database is a nite set of generalized relations. The schema of a generalized database is a set of relation names R1; :::; Rn, each with the corresponding schema. 2 Generalized relations are interpreted following a relational semantics, by which a generalized relation is interpreted as the nite representation of a possibly in nite set of relational tuples. De nition 2: Let  be a decidable logical theory. Let D be the domain of . Let r = ft1; :::; tng be a generalized relation. Let ext(ti ) = fj : (ti) ! D; D j= ti g.2 A generalized tuple ti is inconsistent if ext(ti ) = ;, i.e., if 6 9 such that D j= ti . Two generalized tuples ti and tj , such that (ti) = (tj ), are equivalent (denoted by ti r tj ) i ext(ti ) = ext(tj ) (thus, 8 D j= ti $ tj ). The relational semantics of r, denoted by rel(r), is ext(t1 ) [ ::: [ ext(tn ). Two generalized relations r1 and r2 are requivalent (denoted by r1 r r2) i rel(r1 ) = rel(r2 ). 2 The set ext(t), for a generalized tuple t, and the set rel(r), for a generalized relation r = ft1 ; :::; tng, are sets of assignments, making t or the formula t1 _ ::: _ tn true in

the considered domain. However, each assignment can be seen as a relational tuple. Therefore, in the following the elements of ext(t) or rel(r) are called either assignments or relational tuples, depending from the context. Example 1: Di erent theories can be used to model different types of information. According to the de nition of spatial data given in [18], the theory of linear polynomial constraints (P ) has the sucient expressive power to describe the geometric component of spatial data in geographical applications [27]. In order to represent geometry of geographical objects using constraints, the approach is to use a generalized relation with n variables representing points of a n-dimensional space. Generalized tuples of this relation thus represent sets of points embedded in that space. An identi er should be used in order to group points belonging to the same object. In this example we restrict our attention to the Euclidean Plane (E 2) and we assume that the generalized relation schema is fN; X; Y g. Variable N represents the object identi er whereas variables X and Y represent object points. The types of point-sets of E 2 which can be described using generalized tuples on theory P are shown in Table I. The rst three types, POINT, SEGMENT and CONVEX, correspond to conjunctions of constraints on the theory P . The fourth type, COMPOSITE Spatial Object, corresponds to a disjunction of generalized tuples on P . The representation of time intervals is another interesting application for constraint databases. An interval consists of a time duration which is bound by two endpoints. These endpoints are instants on the time axis. An interval degenerates to an instant when its endpoints coincide. Moreover, an interval is non-contiguous if it does not contain all instants of the axis of time which lie between its endpoints. The dense-order constraint theory3 (D ) [24] is sucient to represent the types INSTANT, INTERVAL and NONCONTIGUOUS INTERVAL, as shown in Table II. Note that composite spatial objects and non-contiguous intervals cannot be represented by a single generalized tuple. Rather, a set of generalized tuples is needed containing one generalized tuple respectively for each convex object belonging to the convex decomposition of the composite spatial object and for each instant or interval belonging to the representation of the non-contiguous interval (see Tables I and II). 3 III. The generalized relational algebra

The algebraic approach represents the correct formalism to obtain both a formal speci cation of the language and a suitable basis for implementation. The class of algebras (one for each decidable theory) we present in this paragraph is a direct extension of the relational algebra and is derived from the algebra presented in [16], [23], [31]. Table III presents the operators of the algebra. In the following, the set of algebraic operators, together with their

3 Constraints of the dense-order theory are of type XY; Xc, where j= denotes the logical consequence symbol. Thus, D j= ti means X and Y are variables, c is a constant, and  2 f=; 6=; ; g. that ti  is true in D [11]. The interpretation is given with respect to a dense-ordered domain. 2

A. BELUSSI ET AL.: AN EXTENDED ALGEBRA FOR CONSTRAINT DATABASES

103

TABLE I

Representation of point sets of the Euclidean Plane in P

Graphical representation POINT (p) p SEGMENT (s)

Analytical representation

P1

Cpoint (p)  (X ? x = 0) ^ (Y ? y = 0) ^ (N = cid )b

p = (x;y)

r

r ?? ? ?

Representation using gen. tuples in P a

P2

q

 P = (x ; y ) 1

1

Csegment(s)  (x1 ? X  0) ^ (X ? x2  0)^ (aX + bY + c =c 0)^ (N = cid )

1

P2 = (x2 ; y1 ) r : aX + bY + c = 0

s=

q

CONVEX (c)

Pn ? b rib Pi rn  

P0 r0 COMPOSITE (csp) s p1 Q Q 3 s  Q 2 s1   c2  A A p2 Ac1

c=

Cconvex (c)  (sign(P1 ; P2 )a1 X + b1 Y + c1  0) ^ :::: ^ (sign(Pn?1 ;Pn )an X + bn Y + cn  0)d ^ (N = cid )

 Pi = (xi ;yi )

ri : ai X + bi Y + ci = 0 i = 0; :::;n

p

q

p

p

p

p

A

csp = (p1 [ ::: [ pn ) [ (s1 [ ::: [ sm ) [ (c1 [ ::: [ cl )

Ccomposite (csp)  fCpoint (p1 ) ^ N = cid ; :::;Cpoint (pn ) ^ N = cid g [ fCsegment(s1 ) ^ N = cid ; :::;Csegment(sm ) ^ N = cid g [ fCconvex(c1 ) ^ N = cid ;:::;Cconvex (sl ) ^ N = cid g

q

a In the table, the symbol  denotes syntactic equality. b cid is a numeric constant, representing the object identi er. c One or both of the rst two conjuncts of this formula can be removed if a semi straight line or a complete straight line

has to be represented. d The introduction of the function sign() is necessary in order to take into account that the polygonal region represented by a simple polygon is always on the left side of the polygon itself. Thus, function sign(P1 ;P2 ) returns 1 or ?1 according to the direction of the line de ned by P1 and P2 . TABLE II

Representation of subsets of the time axis in D

Analytical and graphical representation

Representation using gen. tuples in D

INSTANT (i) i k INTERVAL (int) int

Cinstant(i)  (X = k) ^ (N = cid )a Cinterval(int)  (X  k1 ) ^ (X  k2 ) ^ (N = cid )

-

k1 k2 NON-CONTIGUOUS INTERVAL (intD ) (i1 [ ::: [ in ) [ (int1 [ ::: [ intm ) int1 i1 intm in -

C (intD )  fCinstant (i1 ) ^ N = cid ;:::;Cinstant (in ) ^ N = cid g [ fCinterval(int1 ) ^ N = cid ; :::;Cinterval (intm ) ^ N = cid g

a cid is a numeric constant.

arity, is denoted by GRA whereas the syntactic language (i.e., the set of expressions obtained by combining these operators) is denoted by GRA (Generalized Relational Algebra). Following the approach proposed in [22], each operator

of Table III is described by using two kinds of clauses: those presenting the schema restrictions required by the argument relations and by the result relation, and those introducing the operator semantics. R1; :::; Rn are relation names and e represents the syntactic expression under

104

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. XX, NO. Y, MONTH 1999

analysis. The semantics of expressions is described by using an interpretation function  that takes an expression e and returns the corresponding query.4 The query takes a set of generalized relations on a theory  and computes a new generalized relation as result, containing only consistent generalized tuples. Note that, in order to guarantee operator closure (of projection and complement), the considered theory  must admit variable elimination and must be closed under complementation.5 Finally, note that Table III, together with the resulting relation, also presents the relational semantics of such relation.6 Table III thus de nes a class of algebras, one for each theory  admitting variable elimination and closed under complementation. The support of this algebra is the set of all generalized relations that can be constructed on theory . In the following, given a constraint theory , we denote by GRA() the set of all the queries that can be expressed in the algebra on theory . GRA() satis es an important property: the result of the application of a GRA() query to a generalized database corresponds to the application of the corresponding relational algebra query to the relational database, represented by the relational semantics of the input generalized relations. This property is stated by the following proposition. Proposition 1: [23] Let OP be a GRA operator and let OP rel be the corresponding relational algebra operator. Let ri , i = 1; :::; n, be generalized relations on theory . Then, rel((OP )(r1 ; :::; rn)) = (OP rel )(rel(r1 ); :::; rel(rn)). 2 By using the operators of Table III, some useful derived operators can be de ned, whose semantics is described in Table IV. The following example shows the use of GRA to express queries in spatial and temporal applications. Example 2: Table V(A) shows some spatial queries referred to the geographical application presented in Example 1 (the reader can refer to [14], [15], [18], [34], [38] for some examples of spatial query languages and models). For each query, the table contains a textual description and the mapping to GRA. Queries refer to two sets of spatial objects, which are represented by two generalized relations R and S on P , where (R) = (S ) = fN; X; Y g. N is the generalized tuple identi er whereas X and Y represent points of the spatial objects. Table V(B) shows some queries involving temporal data. The queries concern the trains arriving at a transit station S and leaving from the same station S . The entire set of information is represented by a generalized relation A on D (de ned in Example 1) with four variables (N; F; I; T ). Variable I represents the interval during which the train stops at station S , variable F represents the numeric code 4 Formally, a query is a partial mapping between database instances, invariant with respect to permutations of the domain [10]. 5 A theory admits variable elimination if each formula 9xF (x) of the theory is equivalent to a formula G, where x does not appear. A theory  is closed under complementation if, when0 c is a constraint of , then :c is equivalent to another constraint c of . 6 Other interpretations could have been de ned, maintaining the same semantics for resulting relations.

of the departure station of the train, and variable T represents the numeric code of the destination station of the train. Variable N univocally identi es each group of information (thus, it is a generalized tuple identi er). The time is expressed in minutes from the beginning of the day. 3 IV. Limitations of the generalized relational model

The model presented in Section II and the class of algebras presented in Section III have some limitations. In what follows, we discuss such limitations and we point out the corresponding extensions to GRA() that we propose in order to overcome them. 1. De nition and semantics of generalized tuples. The expressive power of generalized tuples is lower than the expressive power of rst-order formulas without quanti ers. In particular, speaking in terms of spatial data, each generalized tuple can represent only a convex set of points. Thus, in order to be able to model a concave set of points, a convex decomposition of the concave object should be generated; each convex object belonging to such decomposition should then be represented by using a single generalized tuple. In order to relate all these tuples, an identi er should be assigned to each convex object (see Example 1). This approach may result in some degree of redundancy, in particular when descriptive properties have to be associated with concave objects. We claim that a more general de nition of generalized databases can be given. The solution we propose is to de ne a more general notion of generalized tuple, allowing the use of arbitrary sets of logical connectives to connect constraints. For example, when disjunction is allowed, a constraint can represent either a concave or convex point set. Another consideration is related to the semantics assigned to generalized relations. Under the relational semantics introduced in Section II, each generalized relation represents a (possibly in nite) set of tuples, whatever notion of generalized tuple is adopted. We believe that the relational semantics is not the only way to assign a meaning to generalized relations. In particular, a generalized relation can also be interpreted as a nested relation [1], [2], containing a nite number of possibly in nite sets, each corresponding to the extension of a generalized tuple. These issues are investigated in Section V. 2. Algebras. The class of algebras (one for each decidable theory admitting variable elimination and closed under complementation) presented in Section III handles a generalized relation as a (possibly in nite) set of relational tuples. This approach forces the user to think in term of single points; as a consequence, the only way to manipulate generalized tuples as single objects is to assign each generalized tuple an identi er. By assigning a nested semantics to generalized relations, the user has to think in term of sets. Therefore,

A. BELUSSI ET AL.: AN EXTENDED ALGEBRA FOR CONSTRAINT DATABASES

105

TABLE III

GRA operators

Op. name

Syntax e

atomic relation R1 selection P (R1) renaming %[A B ] (R1) j

Restrictions

Semantics r = (e)(r1 ; :::; rn ), n 2 f1; 2ga

(e)b = (R1 )

(P )  (R1) (e) = (R1) A 2 (R1);B 62 (R1) (e) = ( (R1) n fAg) [ fB g (e) = (R1) = (R2)

union

R1 [ R2

projection natural join

[Xi1 ;:::;Xip ] (R1 ) (R1) = fX1 ; :::;Xm g (e) = fXi1 ;:::; Xip g (e)  (R) R 1 1 R2 (e) = (R1) [ (R2)

complementf

:R1

(e) = (R1)

rel(r) = rel(r1 ) r = r1 rel(r) = ft j t 2 rel(r1 );t 2 ext(P )g r = ft ^ P j t 2 r1 ;ext(t ^ P ) 6= ;g rel(r) = ft[A j B ]c: t 2 rel(r1 )g r = ft[A j B ] : t 2 r1 g rel(r) = rel(r1 ) [ rel(r2 ) r = ft j t 2 r1 or t 2 r2 g rel(r) = f[Xi1 ;:::;Xip ] (t)d: t 2 rel(r1 )g r = f[Xi1 ;:::;Xip ] (t)e j t 2 r1 ; ext([Xi1 ;:::;Xip ](t)) 6= ;g rel(r) = ft1 1 t2 : t1 2 rel(r1 );t2 2 rel(r2 )g r = ft1 ^ t2 j t1 2 r1 ; t2 2 r2 ; ext(t1 ^ t2 ) 6= ;g rel(r) = ft j t 62 rel(r1 )g r = ft1 ; :::;tm j t1 _:::_tm is the disjunctivenormal form of :t1 ^ ::: ^ :tn; r1 = ft1 ;:::; tn g; ext(ti ) 6= ;;i = 1;:::;mg

aWe assume that ri does not contain inconsistent generalized tuples. bWe denote by (e) the schema of the relation obtained by evaluating the query corresponding to expression e. c Given an expression F , F [A j B ] replace variable A in F with variable B . d This is the relational projection operator. eGiven a generalized tuple t, the expression [X ;:::;X ] (t) represents the generalized tuple obtained by applying a quanti er i1 ip elimination algorithm to the formula 9 (r) n fXi1 ;:::;Xip g t. f Complement is needed to prove the equivalence of the constraint algebra with the constraint relational calculus [24]. Actually,

the algebra proposed in [23] does not include the complement operator. This operator can be simulated by assuming to deal with a relation representing all possible relational tuples on the given domain. In our setting, we suppose that algebraic operators can only be applied to relations belonging to the schema. Therefore, we need to explicitly insert this operator. TABLE IV

GRA derived operators

Op. name

di erence Cartesian product intersection

Syntax e

R 1 n R2 R 1  R2 R 1 \ R2

Restrictions

(e) = (R1) = (R2) (r1 ) \ (r2) = ; (e) = (R1) [ (R2) (e) = (R1) = (R2)

new classes of algebras should be introduced in order to be able to manipulate objects under the new semantics. We propose two classes of algebras, one for each semantics. Moreover, we present a speci c class of algebras adopting the nested semantics. This issue is investigated in Section VI. 3. Application Domains. The expressiveness of the generalized relational algebra depends on both the set of algebraic operators and the chosen logical theory. Di erent logical theories should be used for di erent application domains (e.g., temporal [25] or spatial [30], [31] applications). However, often the chosen logical theory is not adequate to support all the functionalities needed by the speci c application. For example, in the linear polynomial constraint theory the Euclidean distance cannot be represented; thus, if applications require the use of such function, the more general (and less ecient from a computational point of view) polynomial constraints theory should be adopted. In order to overcome the previous problem, we extend the algebra with external functions. This approach

Derived expression R1 1 :R2 R 1 1 R2 R1 1 R 2

avoids the introduction of a \complex" logical theory by making it possible to adopt a \simple" logic, for example the linear polynomial inequality constraint theory, and to add the speci c computations characterized by high complexity as external functions. In this way, the theory remains simple and the increase in expressive power, and in general in complexity, is embedded in the set of external functions. This issue is investigated in Section VII. In the following sections, we formalize our solutions. V. The extended generalized relational model

In the following, we extend the de nition of generalized tuples, in order to be able to express more general sets in their extension. This is possible by using additional logical connectives in generalized tuples. The basic requirement is that generalized tuples must be quanti er-free, to guarantee an ecient computation.7 As we will see in Section 7 LyriC [8] is an exampleof a constraintobject-orientedmodel where quanti ed constraints are allowed.

106

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. XX, NO. Y, MONTH 1999 TABLE V

Examples of spatial and temporal queries in GRA(P ) and GRA(D )

Type

RANGE ERASE QUERY RANGE QUERY ON PROJECTION SPATIAL INTERSECTION

Type

INSTANT QUERY INTERVAL QUERY TEMPORAL JOIN

Query

(A) Spatial Queries GRA expression

calculate all spatial objects obtained as di erence between an ob- R n P (R [ :R)a ject in R and a rectangle rt 2 E 2 retrieve all spatial objects in R whose projection on the X axis in- [N ] (P ([N;X ] (R))) 1 R tersects the interval [x0 ; x1 ] generate all spatial objects that are R 1 %[N N ] (S ) intersection between one object in R and one object in S j

Query

[N ] (P (A))

[N;F ] (P (Q (A))) [N ] (P (R) 1 A0 ) A0 %[N N ;F F ;T T ] (Q (A)) j

0

j

P  Cconvex (rt)b (P ) = fX;Y g

P  (x0  X ) ^ (X  x1 )

0

(B) Temporal Queries GRA expression

select the identi ers of all trains standing by at station S at time t select the identi ers of all trains that leave to station 3 after time t, together with their departure station the identi ers of all trains select with destination station 3, standing by at station S together with a train from station 4

Conditions

0

j

Conditions

P  Cinstant (t)c (P ) = fI g P  (T = 3) Q  QPost (Cinstant (t))d (Q) = fI g

 P  (T = 3); Q  (F = 4)

0

a The di erence operator (n) is de ned as derived operator in Table IV. b Cconvex () is de ned in Table I. c Cinstant () is de ned in Table II. d QPost (t) is a short form for the set of instants that follow t.

VI, the use of more expressive generalized tuples allows to increase the expressive power of some classes of constraint languages. In the following, a set of rst-order logical connectives without quanti ers is called signature. De nition 3: Let  be a decidable logical theory and  a signature. A generalized tuple on  and  over variables X1 ; :::; Xk is a rst-order formula whose free variables belong to X1 ; :::; Xk, atoms are atomic formulas on , and logical connectives belong to . A generalized relation on  and  is a set of generalized tuples on  and , and a generalized database on  and  is a set of generalized relations on  and . 2 Notice that, under the new de nition, generalized tuples introduced in De nition 1 are generalized tuples on  and f^g. Since we have de ned  to be a set of rst-order logical connectives without quanti ers, the only possible signatures are: f^g; f_g; f^; _g; f:; ^g; f:; _g; f:; ^; _g. By assuming to consider only theories  closed under complementation, the sets of all generalized tuples on  and one of the signatures f^; _g; f:; ^g; f:; _g; f:; ^; _g coincide. Therefore, in the following, to simplify the notation, we only consider the signatures f^g; f_g, and f^; _g. Generalized tuples on  and f^; _g allow us to represent all sets that can be characterized in rst-order logic without quanti ers and are called disjunctive generalized tuples or d-generalized tuples. For what we will discuss in the following, it is useful to denote in some way the set of generalized relations on  and , leading to the de nition of extended generalized relational support.

De nition 4: Let  be a decidable logical theory and  a signature. The set of all generalized relations on  and  (denoted by S (; )) is called extended generalized relational support (EGR support for short) on  and . 2 Note that generalized relations introduced in De nition 1 belong to S (; f^g). Example 3: Tables VI and VII show how composite spatial objects and non-contiguous intervals can be represented using disjunctive generalized tuples. In such representation, each disjunct represents respectively a convex polygon belonging to the convex decomposition of the original object or either an instant or an interval belonging to the representation of the non-contiguous interval. No generalized tuple identi er is needed in this case. 3 A. Nested semantics for EGR supports

The relational semantics is not the only way to assign a meaning to generalized relations. In particular, generalized relations can be interpreted as nested relations [1], [2]. A nested relation is a relation in which attributes may contain sets as values. A generalized relation can be interpreted as a nested relation containing a nite number of possibly in nite sets, each representing the extension of a generalized tuple. This interpretation leads to the de nition of the following semantics. De nition 5: Let r = ft1; :::; tng be a generalized relation. The nested semantics of r, denoted by nested(r), is the set fext(t1); :::; ext(tn)g. Two generalized relations r1 and r2 are n-equivalent (denoted by r1 n r2) i nested(r1 ) = nested(r2 ). 2

A. BELUSSI ET AL.: AN EXTENDED ALGEBRA FOR CONSTRAINT DATABASES

107

TABLE VI

Representation of concave point sets of the Euclidean Plane in P

Graphical representation

COMPOSITE (csp) s p1 Q Q 3 s  Q 2 s1   c2  A A c p2 A 1

Analytical representation

Representation using d-gen. tuples in P

csp = (p1 [ ::: [ pn ) [ (s1 [ ::: [ sm ) [ (c1 [ ::: [ cl )

Ccomposite (csp)  (Cpoint (p1 ) _ ::: _ Cpoint (pn )) _ (Csegment(s1 ) _ ::: _ Csegment(sm )) _ (Cconvex (c1 ) _ ::: _ Cconvex (sl ))

p

q

p

p

p

p

A

q

TABLE VII

Representation of non-contiguous subsets of the time axis in D

Analytical and graphical representation NON-CONTIGUOUS INTERVAL (intD ) (i1 [ ::: [ in ) [ (int1 [ ::: [ intm ) int1 i1 intm in -

Representation using d-gen. tuples in D C (intD )  (Cinstant(i1 ) _ ::: _ Cinstant (in )) _ (Cinterval(int1 ) _ ::: _ Cinterval (intm ))

Note that distinct generalized tuples with the same extension represent the same object in the generalized relation. From De nition 5 it follows that, if two generalized relations are n-equivalent, they are also r-equivalent. Proposition 2: n r . Proof: Suppose that r = ft1 ; :::; tng and r0 = ft01; :::; t0ng. Then, r n r0 ) nested(r) = nested (r0 ) ) fext(t1);S:::; ext(tn)g = fext(t01); :::; ext(t0n)g S ) i=1;:::;n ext(ti ) = i=1;:::;n ext(t0i ) ) rel(r) = rel(r0 ) ) r r r 0 . The di erence between the relational and nested semantics is best shown by the following example. Example 4: Consider a generalized relation r1 containing only the generalized tuple 1  X  2 ^ 2  Y  4 and the generalized relation r2 containing the generalized tuples 1  X  2 ^ 2  Y  3 and 1  X  2 ^ 3  Y  4. It is simple to show that r1 r r2. However, r1 6n r2, since the sets represented inside r1; r2 are di erent. 3 B. Equivalence between EGR supports

r0 2 S (; 2 ) such that r t r0. S (; 1 ) and S (; 2 ) are t-equivalent (denoted by S (; 1 ) t S (; 2 )) i S (; 1 ) t-contains S (; 2 ) and S (; 2 ) t-contains S (; 1 ). 2 From the properties of rst-order logic connectives [11], the following result follows. Proposition 3: Let S (; 1 ) and S (; 2 ) be two EGR supports. S (; 1 ) r S (; 2 ) i the signature 1 [ f_g is equivalent8 to 2 [ f_g. S (; 1 ) n S (; 2 ) i the signature 1 is equivalent to 2. 2 From the previous proposition, it follows that S (; f^; _g) is r-equivalent to S (; f^g), but S (; f^; _g) is not n-equivalent to S (; f^g) [11]. VI. Extended generalized relational algebras

The class of algebras (one for each decidable logical theory admitting variable elimination and closed under complementation) presented in Section III is based on the relational semantics for generalized databases (see Table III and Proposition 1). In general, when adopting the nested semantics for generalized relations, other operators can be de ned, considering the extension of each generalized tuple as a single object. The following example better clari es which operations can be useful. Example 5: Consider a relation R, representing spatial objects contained in the Euclidean plane and having schema N; X; Y , where N is a generalized tuple identi er and X and Y represent the object points. Consider the query \Find all objects in R that are contained in the object o".

The aim of this subsection is to compare the expressive power of di erent EGR supports with a particular attention to S (; f^; _g) and S (; f^g). For this purpose, we rst introduce the concept of containment and equivalence for supports. Since we have de ned two semantics, two notions of equivalence are introduced. We propose a general de nition of these concepts, considering supports with arbitrary theories and signatures. De nition 6: Let S (; 1 ) and S (; 2 ) be two EGR supports. Let t 2 fr; ng. S (; 1 ) t-contains S (; 2 ) sets of rst-order logic operators A and B are equivalent i for (denoted by S (; 1 ) t S (; 2 )) i for each generalized eachTwo formula that can be expressed using operators in A there exists relation r 2 S (; 1 ) there exists a generalized relation an equivalent formula, expressed using operators in B , and vice versa. 8

108

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. XX, NO. Y, MONTH 1999

Let P be the generalized tuple representing \o" in the Euclidean space. Let (P ) = fX; Y g. This query is expressed in GRA() as follows: ([N] (R) n ([N] (R n P (R)))) 1 R: The previous expression has the following meaning:  P (R) selects the points (X; Y ) of R contained in P , together with the identi er of the object to which they belong.  R n P (R) selects the points (X; Y ) that are not contained in P , together with the identi er of the object to which they belong.  [N] (R n P (R)) selects the identi ers of the objects having at least one point not contained in P . Thus, all the retrieved identi ers correspond to objects not contained in P .  [N] (R) n ([N] (R n P (R))) selects the identi ers of the objects contained in P .  ([N] (R) n ([N] (R n P (R)))) 1 R selects the objects contained in P . The previous expression is not very simple to write and to understand, even if the query is one of the most common in spatial applications. The problem is that the query deals with the extension of generalized tuples taken as single objects, whereas, in general, GRA algebra operators deal with single relational tuples, belonging to the extension of generalized tuples. 3 In a general setting, we believe that at least two classes of algebras to manipulate generalized relations can be designed:  R-based algebras: R-based languages are such that the relational semantics of the result of any query they can express is equivalent to the result of an equivalent relational algebra query, when applied to a set of relations representing the relational semantics of the input generalized relations (as the algebras presented in Section III).  N-based algebras: N-based algebras are such that the nested semantics of the result of any query they can express is equivalent to the result of an equivalent nested relational algebra query [1], [2], when applied to a set of nested relations representing the nested semantics of the input generalized relations. All relational algebra expressions can obviously be expressed in the nested relational algebra. It has been proved that also the opposite result holds [32], when input and output relations are not nested objects. When input/output relations are nested objects, the equivalence is guaranteed by the use of object identi ers to code nested objects into

at ones [39]. In the remainder of this paper, we use the following notation. Let  be a set of operator symbols, together with their arity (consider for example GRA).  L is the syntactic language generated by the signature . For example, let GRA = LGRA . Then, as an example, R1 1 R2 2 GRA.

q

r1; :::; rn

rel=nested

rel=nested r10 ; :::;? rn0

- rn+1

q0

- r0 ?

n+1

Fig. 1. R-based and n-based languages

A () is the class of algebras on , for which the semantics of operators is xed, but the support is not. Thus, the semantics of each operator op can be seen as a function fop with polymorphic type. For example, assuming to deal with an arbitrary support S (; ), Table III speci es the operator semantics for the algebra whose support is S (; ). Thus, whatever the chosen support is, the table assigns a speci c meaning to syntactic operators.  L () is the set of all the queries obtained by composition of functions representing the operator semantics (thus, it is a semantic language). Thus, for each expression e 2 L , there exists a function (e) 2 L () representing the semantics of e. Note that each query in L () is a function with polymorphic type, since it can be applied to arbitrary supports. Moreover, there exists a one-to-one correspondence between expressions contained in L and queries contained in L (). For this reason, in the following, when it is clear from the context, we use indi erently L and L () to denote both the syntactic and the semantic language. Similarly, an expression e is also used to denote the semantic function (e). Semantic functions associated with operators in  will be also called L () operator semantic functions.  L (; ) is the set of all the queries obtained by composition of functions representing operator semantics, forcing the type to be S (; ). Note that, using this notation, GRA(), introduced in Section III, corresponds to GRA(; f^g). Thus, from now on, we use this notation. Using this notation we can nally introduce n-based and r-based languages. De nition 7: Let L be a syntactic language. Let  be a logical theory admitting variable elimination and closed under complementation. Let Rel be the set of all relational queries. Let N Rel be the set of all nested relational queries. Then:  L() is r-based i there exists a query mapping h : L() ! Rel such that h(q) = q0 and for all supports S (; ), for all generalized relations ri 2 S (; ), i = 1; :::; n, rel(q(r1 ; :::; rn)) = q0 (rel(r1 ); :::; rel(rn)) (see Fig. 1).  L() is n-based i there exists a query mapping h : L() ! N Rel such that h(q) = q0 and 

A. BELUSSI ET AL.: AN EXTENDED ALGEBRA FOR CONSTRAINT DATABASES

for all supports S (; ), for all generalized relations ri 2 S (; ), i = 1; :::; n, nested(q(r1 ; :::; rn)) = q0 (nested(r1 ); :::; nested(rn)) (see Fig. 1). 2 Note that De nition 7 implies that algebra operators are independent from the chosen support, i.e., similar computations can be applied to di erent supports. Moreover, from De nition 7 and Proposition 1, it follows that GRA() is r-based. Since relational operators are part of any nested relational algebra, r-based algebras are also n-based. We call strict n-based algebras the languages that are n-based but are not r-based. The remainder of this section is organized as follows. In Subsection VI-A we analyze the relationships existing between languages and EGR supports. In Subsection VIB we introduce a n-based language, obtained by extending GRA(). In Subsection VI-C we prove that the proposed language is n-based and we study the equivalence between this language and GRA(). A. Relationship between languages and EGR supports

Given two semantic languages, the relationships existing between the supports on which they are based allow us to detect some relationships existing between the expressive power of such languages. In order to formalize these notions, the concept of equivalence between languages should be introduced. De nition 8: Let L1 = L1 and L2 = L2 be two syntactic languages. Let  be a decidable theory, admitting variable elimination and closed under complementation. Let S (; 1 ) and S (; 2 ) be two EGR supports. Let t 2 fr; ng. L1 (; 1) is t-contained in L2 (; 2) (denoted by L1 (; 1) t L2 (; 2)) i for each query q 2 L1 (; 1) there exists a query q0 2 L2(; 2 ) such that for each input generalized relation ri 2 S (; 1 ), i = 1; :::; n, a generalized relation ri0 2 S (; 2 ) exists such that ri t ri0 and q(r1 ; :::; rn) t q0 (r10 ; :::; rn0 ). L1 (; 1) is t-equivalent to L2 (; 2) (denoted by L1 (; 1) t L2 (; 2)) i L2 (; 2) t L1 (; 1) and L2 (; 2) t L1 (; 1). 2 Note that, in the previous de nition of equivalence, equivalent expressions take equivalent input relations. We now analyze the expressive power of a constraint language L() with respect to di erent EGR supports (proofs of the following results are presented in [6]). Proposition 4: Let L1 = L1 and L2 = L2 be two syntactic languages. Let  be a decidable theory admitting variable elimination and closed under complementation. Let t 2 fr; ng. The following facts hold: 1. If Li () is t-based, then for all S (; 1 ); S (; 2 ), S (; 1 ) t S (; 2 ) i Li (; 1) t Li (; 2). 2. If L1 (; 1) t L2 (; 2) then S (; 1 ) t S (; 2 ). 2 Since GRA() is r-based, Proposition 4 implies that queries that can be expressed in GRA(; f^g) can also be expressed in GRA(; f^; _g), since S (; f^g) r S (; f^; _g). Another interesting property is stated by the following proposition.

109

Proposition 5: Let L = L be a syntactic language. Let  be a decidable theory admitting variable elimination and closed under complementation. Let t 2 fr; ng. Let S (; 1 ) and S (; 2 ) be two EGR supports. If L() is t-based, for all q 2 L(), for all r1 ; :::; rn 2 S (; 1 ) and for all r10 ; :::; rn0 2 S (; 2 ), such that ri0 t ri , q(r1 ; :::; rn) t q(r10 ; :::; rn0 ) holds. 2 Proposition 5 speci es that queries expressed in a tbased language are independent from the particular representation given to t-equivalent generalized relations. Note that the previous propositions, as well as De nition 7, imply that the semantics of algebra operators is independent from the chosen support. B. Language de nition

In the following we present a n-based algebra for constraint databases that we call Extended Generalized Relational Algebra, since it is obtained by extending the generalized relational algebra with new operators. This language has been designed to manipulate generalized tuples under two di erent points of view, assuming to assign the nested semantics to generalized relations. There are two ways of manipulating generalized relations: 1. Set operators. They apply a certain computation to groups of relational tuples, each represented by the extension of a generalized tuple. Consider a generalized relation R(X; Y ) where each generalized tuple represents a rectangle. Each tuple has the form: X  a1 ^ X  a2 ^ Y  b1 ^ Y  b2. If we want to know which rectangles are contained in a given space, each constraint must be interpreted as a single object and a subset of the input rectangles must be returned as query answer. 2. Tuple operators. They apply a certain relational computation to generalized relations and assign a given nested representation to the result. As an example of application, consider again a generalized relation R(X; Y ) where each generalized tuple represents a rectangle. The detection of the set of points contained in the intersection space of each rectangle with a given spatial object is a typical tuple operation. Note that, under the nested semantics, tuple operators apply computations to relational tuples, nested inside sets, represented by generalized tuples. This approach greatly simpli es nested computation. We believe that both types of operators are useful when dealing with constraint databases, since they correspond to two complementary types of generalized tuple manipulation. The new syntactic language is denoted by EGRA. EGRA operators are the following:  Tuple operators, except complement, are exactly the operators introduced in Table III. The EGRA complement operator always returns a generalized relation which is relationally equivalent to the generalized relation returned by the GRA complement operator (when both operators are applied to the same generalized relation). However, such resulting relations

110

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. XX, NO. Y, MONTH 1999 TABLE VIII

EGRA operators

Op. name

Syntax e

atomic relation selection renaming

R1 P (R1 ) %[A B ] (R1 )

projection

[Xi1 ;:::;Xip ](R1 )

natural join complement

R 1 1 R2 :R

union set di erence set complement set selection

R1 [ R 2 R1 ns R2 :ssR1 (Q1 ;Q2 ;)) (R1 )

j

Restrictions Semantics r = (e)(r1 ; : : : ; rn ); n 2 f1; 2ga Tuple operators

(e) = (R1) (P )  (R1) (e) = (R1) A 2 (R1 );B 62 (R1 ) (e) = ( (R1) n fAg) [ fB g (R1) = fX1 ;:::;Xm g (e) = fXi1 ; :::;Xip g (e)  (R) (e) = (R1) [ (R2 ) (e) = (R)

r = r1 r = ft ^ P : t 2 r1 ;ext(t ^ P ) 6= ;g r = ft[A j B ] : t 2 r1 g

r = f[Xi1 ;:::;Xip ] (t) j t 2 r1 ;, ext([Xi1 ;:::;Xip ] (t)) 6= ;g

Set operators

(e) = (R1) = (R2) (e) = (R1) = (R2) (e) = (R1) (Q1)  (Q2) (e) = (R1) (sQ1 ;Q2 ;16=;)) (R1) (Q1) = (Q2) (e) = (R1)

r = ft1 ^ t2 : t1 2 r1 ; t2 2 r2 ; ext(t1 ^ t2 ) 6= ;g r = ft1 _ ::: _ tm j t1 _ ::: _ tm is the disjunctive normal form of :t1 ^ ::: ^ :tn ;r1 = ft1; :::;tn g; ext(ti ) 6= ;;i = 1; :::;mg r = ft : t 2 r1 or t 2 r2 g r = ft : t 2 r1 ; 6 9t0 2 r2 : ext(t) = ext(t0 )g r = fnot tb: t 2 r1 ; ext(not t) 6= ;g r = ft : t 2 r1 ; ext(Q1(t))  ext([ (Q1 )](Q2 (t)))g r = ft : t 2 r1 ; ext(Q1(t)) \ ext(Q2(t)) 6= ;g

aWe assume that ri does not contain inconsistent generalized tuples, i = 1;:::;n. bThe expression not t represents the disjunctive normal form of the formula :t.



are not nested equivalent. Set operators are the following: 1. Set di erence. Given two generalized relations r1 and r2 , this operator returns all generalized tuples contained in r1 for which there does not exist an equivalent generalized tuple contained in r2. This is the usual di erence operation in nested relational databases [1], [2]. 2. Set complement. Given a generalized relation r, this operator returns a generalized relation containing a generalized tuple t0 for each generalized tuple t contained in r; t0 is the disjunctive normal form of the formula :t [11]. 3. Set selection. This operator selects from a generalized relation all the generalized tuples satisfying a certain condition. The condition is of the form (Q1; Q2; ), where  2 f; (16= ;)g and Q1 and Q2 are either: { A generalized tuple P on the chosen support. { Expressions contained in L where  = ft=0; [X1 ;:::;Xn] =1g. t represents the input generalized tuple; the interpretation of [X1 ;:::;Xn ] is a function taking a generalized tuple t0 and returning the projection of t0 on variables X1 ; :::; Xn. In order to simplify the notation, in the following Q1 and Q2 are used both to represent the syntactic expressions and their semantic function. The set selection operator with condition (Q1 ; Q2; ), applied on a generalized relation r, selects from r only the generalized tuples t for which there exists a

relation  between ext(Q1(t)) and ext(Q2(t)). When a condition C is satis ed by a generalized tuple, we denote this fact by C (t). The possible meanings of  operators are the following: {   : in this case, we require that (Q1)  (Q2). It selects all generalized tuples t in r such that ext(Q1 (t))  ext([ (Q1 )] (Q2(t))). {   16= ;: in this case, we require that (Q1) = (Q2). It selects all generalized tuples t in r such that ext(Q1(t)) \ ext(Q2 (t)) 6= ;. Note that, since the considered theory  is decidable, set selection conditions are also decidable. Table VIII presents set and tuple operators, according to the notation introduced in Section III. Note that, in order to guarantee operator closure, EGRA operators can only be applied to generalized relations belonging to the EGR support S (; f^; _g), where  is a logical theory admitting variable elimination and closed under complementation. Thus, from now on, EGRA() should be interpreted as a short form for EGRA(; f^; _g). Example 6: Tables IX shows examples of spatial and temporal queries in EGRA(P ) and EGRA(D ) respectively. Generalized relations are interpreted as in Example 2. 3 Several derived operators can be de ned. Clearly, all GRA() derived operators can also be seen as EGRA() derived operators. However, by using set operators, other derived operators can be de ned, whose semantics is described in Table X. Proofs are presented in [6].

A. BELUSSI ET AL.: AN EXTENDED ALGEBRA FOR CONSTRAINT DATABASES

111

TABLE IX

Examples of spatial and temporal queries in EGRA(P ) and EGRA(D

Type

RANGE INTERSECTION QUERY RANGE CONTAINMENT QUERY ADJACENT QUERY SPATIAL JOIN (intersection based)

Query

select all spatial objects in R that intersect the region of space identi ed2 by a given rectangle rt 2 E select all spatial objects in R that are contained in the region of space identi ed by a given rectangle rt 2 E 2 select all spatial objects in R that are adjacent to a spatial object sp 2 E 2 generate all pairs of spatial objects (r;s) r 2 R; s 2 S , such that r intersects s

(A) Spatial Queries EGRA expression (st;P;(16=;)) (R)

P  Cconvex (rt) (P ) = fX;Y g

(st;P;) (R)

P  Cconvex (rt) (P ) = fX;Y g P  Ccomposite (sp) (P ) = fX;Y g c1  (QInt(t);QInt (P ); (1= ;)) c2  (QBnd (t);QBnd (P ); (16= ;)) c  (Q1 (t);Q2 (t); (16= ;)) Q1 (t)  [X;Y ] (t) Q2 (t)  %[X jX ;Y jY ] ([X ;Y ] (t)) c1  (Q1;1 (t);Q1;2 (t); (1= ;))) Q1;1 (t)  QInt([X;Y ] (t)) Q1;2 (t)  QInt(g(t)) c2  (Q2;1 ;Q1;2 (t); (16= ;)) Q2;1 (t)  QBnd ([X;Y ] (t)) Q2;2 (t)  QBnd (g(t)) g(t)  %[X jX ;Y jY ] ([X ;Y ] (t))

cs1 (cs2 (R)) cs (R 1 %[X jX ;Y jY ] (S )) 0

0

0

SPATIAL generate all pairs of spatial JOIN (adjacency objects (r;s); r 2 R; s 2 S , cs1 (cs2 (R 1%[X jX ;Y jY ] (S ))) based) such that r is adjacent to s 0

DIFFERENCE QUERY COMPLEMENT QUERY

Type INTERVAL QUERY RANGE QUERY TEMPORAL JOIN

0

select all spatial objects in R s for which there are no spa- c (([X ] (R) ns [X ] (S )) 1 R0 ) tial objectsin S with the same R0  %[X jX ;Y jY ] (R) projection on X compute the portions of E 2 :sR that are the complement of a spatial object of R 0

Query select the complete information about all trains that leave after time t (expressed in minutes from time 00 : 00) to station a select the complete information about all trains that arrive in the interval i select the complete information about all trains that arrive at the station S when another train to destination d is standing by at the same station

Conditionsa

0

(B) Temporal Queries EGRA expression (sc1 ^c2) (A) cs (A)

[N;I;F;T ] (cs2 (A 1 A0 )) A0  %[N N ;F jF ;I jI ;T jT ] (cs1 (A))) j

0

0

0

0

0

0

0

0

0

0

0

c  ([X ] (t);%[X jX ]([X ](t)); =) 0

0

Conditionsb

P  QPost (Cinstant(t)) P 0  (T = a) Q(t)  QStP ([I ](t)) (P ) = fI g c1  (Q(t)0 ;P; ) c2  (t;P ; (16= ;)) P  Cinterval (i) (P ) = fI g c  (QStP ([I ](t));P; (16= ;)) P  (T = d) c1  (t;P; (16= ;)) c2  (QStP ([I ] (t));Q(t); 16= ;) Q(t)  %[I jI ] ([I ] (t)) 0

0

aIn this column the following symbols are used:  Cconvex () and Ccomposite (): see Table I;  QBnd and QInt represent a short form for queries retrieving the boundary and the interior of a spatial object respectively [40]. bIn this column the following symbols are used:  Cinstant () and Cinterval(): see Table II;  QStP (t): it is a short form for the query retrieving the set of instants that represent the starting points of all contiguous intervals

contained in t;  QPost (t): it is a short form for the query retrieving the set of instants that follow the interval represented by t.

It can be easily shown that EGRA() operators are independent, i.e. the semantic function of no operator can be expressed as the composition of the semantic functions associated with other operators [6]. C. Properties of EGRA(; f^; _g)

In the following we prove that:

1. EGRA() is a n-based algebra. 2. GRA(; 1) 6r EGRA(; f^; _g), and therefore GRA(; 1) 6n EGRA(; f^; _g), for all 1 . However, we introduce a weaker notion of equivalence and we show that GRA(; 1) and EGRA(; f^; _g), under speci c conditions for 1 , are equivalent under this new de nition.

112

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. XX, NO. Y, MONTH 1999 TABLE X

EGRA derived operators

Op. name

set intersection derived set selection

Syntax e R 1 \ s R2 (sQ1 ;Q2 ;) (R)

Restrictions

(e) = (R1 ) = (R2 ) (Q1)  (Q2) (e) = (R) (sQ1 ;Q2 ;6) (R) (Q1)  (Q2) (e) = (R) (sQ1 ;Q2 ;6) (R) (Q1)  (Q2) (e) = (R) (sQ1 ;Q2 ;1=;) (R) (Q1) = (Q2) (e) = (R) Cs 1 ^C2 (R) (e) = (R) Cs 1 _C2 (R) (e) = (R) :s C1 (R) (e) = (R) (sQ1 ;Q2 ;=) (R) (Q1) = (Q2) (e) = (R) (sQ1 ;Q2 ;) (R) (Q1)  (Q2) (e) = (R) (sQ1 ;Q2 ;) (R) (Q1)  (Q2) (e) = (R) (sQ1 ;Q2 ;) (R) it depends on 

Derived expression

R1 ns (R1 ns R2 ) (sQ2 ;Q1 ;)) (R)

R ns (sQ1 ;Q2 ;) (R) R ns (sQ2 ;Q1 ;) (R)

R ns (sQ1 ;Q2 ;16=;) (R)

Cs 1 (R) \ Cs 2 (R) Cs 1 (R) [ Cs 2 (R) R ns Cs 1 (R) (sQ1 ;Q2 ;)^(Q2 ;Q1 ;) (R)

(sQ1 ;Q2 ;)^:(Q1 ;Q2 ;=) (R) (sQ2 ;Q1 ;)^:(Q1 ;Q2 ;=) (R)

%[Xj1 =Xj ] ([Xj1 ] (cs2 (cs1 (Q01 1 Q02 ))))a Q010 and Q02 are such that Qi (r) = f%[Xj jX i ] (t) ^ %[Xj jX i ] (Qi(t))jt 2 rg i = 1; 2 j j c1  ([Xj1 ] (t);%[Xj2 jX 1 ] ([Xj2 ](t)); =) j c2  ([X1 ] (t);%[X 2 j 1 ] ([X2 ](t));) j

j Xj

j

a We denote by [Xj ] the set of variables [X1 ; :::;Xn ] and by [Xj jYj ] the set of renamings [X1 jY1 ;:::; Xn jYn ].

3. Under speci c assumptions, the data complexity of creates partitions based on the construction of equiEGRA(; f^; _g) is equal to the data complexity of valence classes. Two tuples are equivalent if they have GRA(; f^; _g). the same values for the attributes which are not being nested. For each equivalence class, a single tuple is C.1 EGRA() is a n-based language placed into the result. The attributes being nested are used to generate a nested relation containing all tuples In order to show that EGRA() is n-based, following in the equivalence class for those attributes. De nition 7, we present a mapping from EGRA expressions Theorem 1: EGRA() is a n-based algebra. to nested relational algebra expressions, satisfying De niProof: (Sketch) It is possible to show that for each tion 7. Let D be a domain of values. The nested relational model EGRA() query there exists an equivalent nested relational algebra query. Let D be the domain of . The proof, deals with objects of type: presented in [6], is based on the following translation of generalized relations and generalized tuples into nested re ::= D j hA1 : ; :::; An :  i j f g lations: where A1; :::; An are attribute names. In the literature, sev Each generalized relation R with schema fX1 ; :::; Xng eral nested relational algebras have been proposed, most of can be seen as a nested relation of type fhA : fhX1 : which are equivalent. A basic nested relational algebra conD; :::; Xn : Digig, where D is the domain of the chosen sists of the following operators:9 theory. 1. the classical relational operators extended to nested  Given a generalized tuple P with schema fX1 ; :::; Xng, relations: union ([), di erence (n), selection (), proP can be interpreted as the nested relation r(P ) jection (), and join (1); de ned as fhAP : hX1 : a1; :::; Xn : anii j X1 = 2. two restructuring operators: nest and unnest. a1 ^ ::: ^ Xn = an 2 ext(P )g. Note that the type The unnest operator transforms a relation into one of r(P ) is fhAP : hX1 : D; :::; Xn : Diig. which is less deeply nested by concatenating each ele Given a generalized tuple P with schema fX1 ; :::; Xng, ment in the set attribute being unnested to the reP can also be interpreted as the nested relation n(P ) maining attributes in the relation. The nest operator containing only one element, represented by the set ext (P ). Thus, n(P ) coincides with the set fhAP : Other proposed nested relational algebra also contain a powerset fh X 1 : a1 ; :::; Xn : an igi j X1 = a1 ^ ::: ^ Xn = an 2 operator [1]. Since it is not needed for the development of this paper, we omit its description. ext(P )g. The type of n(P ) is fhAP : fhX1 : D; :::; Xn : 9

A. BELUSSI ET AL.: AN EXTENDED ALGEBRA FOR CONSTRAINT DATABASES

Digig. Using this representation, for each EGRA() query it is possible to construct an equivalent nested relational algebra query. C.2 Equivalence results It is immediate to prove the following proposition. Proposition 6: GRA(; 1) r EGRA(; f^; _g). Proof: It is simple to show that, given some generalized relations r1 ; :::; rn, EGRA() tuple operators, when applied to r1 ; :::; rn, return a generalized relation that is requivalent to the generalized relation that is obtained by applying the corresponding GRA() operator to r1; :::; rn. Thus, GRA(; f^; _g) r EGRA(; f^; _g). Moreover, it can be shown that S (; 1 ) r S (; f^; _g), for all 1 . Since GRA() is r-based, from Proposition 4, it follows that GRA(; 1) r GRA(; f^; _g). The thesis follows by transitive closure of the previous results. Note that, since the semantic function associated with the complement in GRA() always returns a generalized relation which is not n-equivalent to the generalized relation returned by the semantic function associated with the complement in EGRA() (when both semantic functions are applied to the same input generalized relation), it follows that GRA(; 1) 6n EGRA(; f^; _g). Now we analyze the opposite containment. A necessary condition for expressing an EGRA(; f^; _g) query in GRA(; 2) is to modify the input database, coding in some way each generalized tuple as a set. The aim of this section is to prove that, due to this transformation, EGRA(; f^; _g) and GRA(; 2) are not r-equivalent, whatever 2. To prove this result, a weaker notion of equivalence is rst introduced. This new equivalence relation is called weak, since it relaxes the conditions under which the usual equivalence is de ned (see De nition 8). The basic idea of weak equivalence is that of coding in some way the input of an EGRA(; f^; _g) query, before applying the corresponding GRA(; 2) query. After that, a decoding function should be applied to the result, to remove the action of the encoding function. A similar approach has been taken in [36] and in [41] to prove results about the nested relational algebra and the relational algebra. Encoding and decoding functions can be formalized as follows. De nition 9: An encoding function of type (; 1; 2) is a total computable function f from S (; 1 ) to S (; 2 ). A decoding function of type (; 1; 2) is a partial computable function g from S (; 2 ) to S (; 1 ). 2 Weak equivalence can be de ned as follows. De nition 10: Let L1 = L1 and L2 = L2 be two syntactic languages. Let S (; 1 ) and S (; 2 ) be two EGR supports. Let t 2 fr; ng. L1 (; 1) is weakly t-contained in L2 (; 2) (denoted by L1 (; 1) wt L2 (; 2)) i there exist an encoding function f of type (; 1; 2) and a decoding function g of type (; 1 ; 2) such that for each query q 2 L1(; 1 ) there exists a query q0 2 L2 (; 2) with the following property:

113 q r ,....,r 1 n

r n+1

f

g

r’ ,...,r’ 1 n

q’

r’ n+1

Fig. 2. Graphical representation of weak containment

for all relations ri 2 S (; 1 ), i = 1; :::; n, q(r1 ; :::; rn) t g(q0 (f (r1 ); :::; f (rn))). L1 (; 1) is weakly t-equivalent to L2(; 2 ) (denoted by L1 (; 1) wt L2 (; 2)) i L1 (; 1) wt L2 (; 2) and L1 (; 1) wt L2 (; 2). 2 Fig. 2 graphically represents weak containment. It is simple to show that if L1 (; 1) t L2 (; 2), then L1 (; 1) wt L2 (; 2). Moreover, if L1 (; 1) wt L2 (; 2) and functions f and g can be represented in L2 (; 2), then L1 (; 1) t L2 (; 2). In the following we prove that EGRA(; f^; _g) wn GRA(; 2), assuming that ^ 2 2 (thus, either 2 = f^g or 2 = f^; _g). However, to simplify the presentation, we suppose that 2 = f^g. The other case derives from that. The chosen encoding and decoding functions of type (; 1; 2) are presented in Table XI. Assuming to deal with a countable set of variables, without compromising the generality of the discussion, the de nitions are given with respect to a countable set of variables N~ , only used to assign identi ers to generalized tuples. The encoding function transforms a generalized relation r 2 S (; f^; _g) in a generalized relation r0 2 S (; f^g), such that each generalized tuple of r is contained in r0 together with a new variable identi er, represented by a constraint admitting only one solution. Each generalized tuple of r containing disjunctions is divided in r0 in several generalized tuples, all having the same identi er. The decoding function projects the input relation on all variables, except the ones contained in N~ , if any. If more than one tuple in the input relation has the same values for variables in N~ , the disjunction of such tuples is taken. Table XII shows for each EGRA(; f^; _g) basic query the corresponding weak equivalent GRA(; 2) query. The two lemmas, presented below, are used in the proof of Theorem 2. See [6] for their complete proofs. Lemma 1: Let ri 2 S (; ) such that (ri ) \ N~ 6= ;, i = 1; :::; n, n 2 f1; 2g. Let q be the query associated with one of the GRA expressions listed in the second column of Table XII. Let f and g as de ned in Table XI. Then, g(q(f (g(r1 )); :::; f (g(rn)))) n g(q(r1 ; :::; rn)). 2 Lemma 2: Let f and g as de ned in Table XI. Let 1 and 2 be two signatures such that 1 = f^; _g and ^ 2 2 . For each EGRA(; 1) operator semantic function fOP of arity n, n 2 f1; 2g, there exists a GRA(; 2) query q such that for all r1 ; :::; rn; n 2 f1; 2g; ri 2 S (; 1 ); fOP (r1 ; :::; rn) n g(q(f (r1 ); :::; f (rn))). Proof: (Sketch) Let R1; :::; Rn be the names of the generalized relations belonging to the schema we consider.

114

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. XX, NO. Y, MONTH 1999 TABLE XI

Encoding and decoding functions, used by the equivalence theorem

Functions of type (; f^; _g; f^g)

Encoding f

Decoding g

S f (r) =

De nition

0 t2r f (t) 1 ; :::;tmn jr = ft ; :::;t g, r0 =mftm n 1 n 1 tk k  N = mk ^ tk , N 2 N~ , N 62 (r), mk 2 D, k = 1;:::;n, for all i; j , 1  i; j  n, i 6= j , ext(ti) 6= ext(tj ) ! mi 6= mj g f 0 (t) = fN = m ^ t1 ;:::;N = m ^ tn j t  N = m ^ (t1 _ ::: _ tn )g D is the domain of the considered theory  g(r) = f[ (r)nN~ ] (t1 ) _ ::: _ [ (r)nN~ ] (tn ) j t1 ; :::;tn 2 r, [N~ ] (t1 ) = ::: = [N~ ] (tn ), 6 9tn+1 2 r; [N~ ](tn+1 ) = [N~ ] (t1 ), such that ext(tn+1 ) 6= ext(ti), i = 1;:::;n g 0

Let (R0i) = (Ri) [ fN g, i = 1; :::; n, N 2 N~ . Let D be the domain of . Let Q0i be the query obtained from query Qi by inserting variable N in all projection operators. Table XII shows for each basic EGRA(; f^; _g) query the weakly equivalent GRA(; 2) query. See [6] for the complete proof of the lemma. Theorem 2: Let 1 and 2 be two signatures such that 1 = f^; _g and ^ 2 2. Then, EGRA(; 1) wn GRA(; 2). Proof: We prove the thesis by induction on the structure of an EGRA(; f^; _g) query q.  Base case: q is an operator semantic function. The thesis follows from Lemma 2.  Inductive step: Let q  fOP (q1 ; q2) where fOP is an operator semantic function and q1 and q2 are queries (the proof assumes OP to be a binary operator; a similar proof holds also for unary operators). By inductive hypothesis we know that q10 ; q20 2 GRA(; 2) exist such that: 8r1 ; :::; rn 2 S (; 1 ) qi (r1; :::; rn) n g(qi0 (f (r1 ); :::; f (rn))); i = 1; 2 From Theorem 1, we know that EGRA() is n-based. From Proposition 5 and the inductive hypothesis, we obtain that q(r1; :::; rn) = fOP (q1 (r1; :::; rn); q2(r1; :::; rn)) is nested equivalent to S  fOP (g(q10 (f (r1 ); :::; f (rn))); g(q20 (f (r1 ); :::; f (rn)))). Let q0 be the GRA() query corresponding to fOP in Table XII. By Lemma 2, S is nested equivalent to g(q0 (f (r10 ); f (r20 ))), where ri0 = g(qi0 (f (r1 ); :::; f (rn))), i 2 f1; 2g. From Lemma 1, we can replace f (ri0 ) with qi0 (f (r1 ); :::; f (rn)), i = 1; 2, obtaining that g(q0 (f (r10 ); f (r20 ))) is nested equivalent to g(q0 (q10 (f (r1 ); :::; f (rn)); q20 (f (r1 ); :::; f (rn)))). Note that Lemma 1 can be applied since qi0 (f (r1 ); :::; f (rn)) satis es the hypothesis of the lemma. The query q  q0 (q10 ; q20 ) satis es the thesis.

1. EGRA(; 1) wr GRA(; 2). 2. EGRA(; 1) 6r GRA(; 2). Proof:

1. It follows from Proposition 6 and Theorem 2. 2. This result derives from the fact that the proposed encoding and decoding functions cannot be represented in GRA(; 2). Even if EGRA(; f^; _g) wr GRA(; f^g), queries in GRA(; f^g) are often very complex when compared with the corresponding queries in EGRA(; f^; _g) (see Table XII), even the ones implementing simple user requests. This fact leads to the important consequence that GRA is not adequate to be used as a user language. On the other hand, EGRA(; f^; _g) allows users to easily write useful queries, without being aware of the tuple identi er. Example 7: The query of Example 5, which in GRA is represented as ([N] (R) n ([N] (R n P (R)))) 1 R: s can be simply expressed in EGRA as (t;P; 3 )(R): From Theorem 1 and Theorem 2, it is simple to prove that EGRA(; f^; _g) is a strict n-based language. Corollary 2: EGRA() is a strict n-based algebra. 

Proof: EGRA() is a n-based algebra: it follows from The-



EGRA() is not r-based: suppose EGRA() be r-

orem 1.

based. Since GRA() is r-based, this means that for each signature 2 and for 1 = f^; _g EGRA(; 1) is r-equivalent to GRA(; 2). But this is not true. Indeed: { If ^ 2 2 , from item (2) of Corollary 1 it follows that EGRA(; 1) is not r-equivalent to GRA(; 2). { If ^ 62 2, then S(; 1 ) is not r-equivalent to S (; 2 ). Therefore, by Proposition 4, EGRA(; 1) is not r-equivalent to GRA(; 2). Since in both cases we obtain a contradiction, EGRA() is not r-based.

Note that if ^ 62 2 , the equivalence does not hold. Indeed, in such case there does not exist an encoding function of type (; f^; _g; 2). The following corollary presents nal equivalence results C.3 Data complexity about EGRA(; 1) and GRA(; 2). A constraint query Q has data complexity in the comCorollary 1: Let 1 and 2 be two signatures such that plexity class C if there is a Turing machine that, given an 1 = f^; _g and ^ 2 2 . The following facts hold:

A. BELUSSI ET AL.: AN EXTENDED ALGEBRA FOR CONSTRAINT DATABASES

115

TABLE XII

Translation of EGRA expressions in GRA expressions

EGRA expression Ri P1 _:::_Pn (R) %[A B ] (R) R1 1 R 2 R1 [ R 2 j

[Y~ ](R) :R R1 ns R2

GRA expression

R0i P1 (R00 ) i n = 1 0 P1 (R ) [ ::: [ Pn (R ) otherwise %[A B ] (R0 ) R01 1 %[N=N ] (R02), N 0 2 N~ (R01 1 [N ] (%[N=N ] (N =n1 (R01 [ :R01 )))) [ (R02 1 [N ] (%[N=N ] (N =n2 (R02 [ :R02)))) N 0 2 N~ , n1 6= n2 , n1 ; n2 2 D [Y~ [fN g](R0 ) [ (R )nfN g](:R0 ) 1 [N ] (N =n1 (R0 [ :R0 )), n1 2 D ([N ] (R01 ) n [N ](Y 1 Z )) 1 R01 where X = R01  %R[R1 ] (R02 ), N 0 2 N~ 2 Y = [N;N ](X ) n [N;N ]([( (R1 )[fN g)] (X ) n %?1 [RR1 ]a ([( (X )n (R1 ))[fN g](X ))) 2 Z = [N;N ](X ) n ([N;N ] (%?1 [RR1 ] ([( (X )n (R1 ))[fN g](X ))) n [( (R1 )[fN g)] (X )) 2 [N ] (R0 ) 1 :R0 0 0 0 b 0 ([N ] (R ) n [N ] (Q1(R ) nQ2 (R0 )) 1 R0 [N ] (Q01 (R0 ) 1 Q02 (R0)) 1 R0 j

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

:s(R) (sQ1 ;Q2 ;) (R) (sQ1 ;Q2 ;16=;) (R)

0

0

0

0

0

0

a %R1 denotes the operation replacing each variable X of R0 also contained in the schema of R0 by a new variable X 0 2 1 [R2 ] such that X 0 2 N~ i X 2 N~ . Moreover, %?1 [RR1 ] denotes the operation replacing each variable X 0 62 N~ previously changed 2 by %R[R1 ] by its original name X . b If Q2i is a generalized tuple P , Q0 (R0 ) is the query that, for each generalized relation r, returns a new generalized relation i 0

0

0

0

0

0

containing n generalized tuples, where n is the cardinality of r. Each generalized tuple is equivalent to N = m ^ P , where m is the generalized tuple identi er of a tuple in r.

input generalized relation d, produces some generalized relation representing the output Q(d) and uses time in class C , assuming some standard encoding of generalized relations. The analysis of data complexity of EGRA(; f^; _g) queries follows from the fact that EGRA(; f^; _g) w r GRA(; 2), assuming that ^ 2 2 , and results about data complexity of GRA(; 2). It is simple to show that the data complexity of the chosen encoding and decoding functions f and g is in class NC [3]. Therefore, by considering Fig. 2, we can deduce that, if the complexity of GRA(; 2) is in a complexity class C containing or equal to NC, the data complexity of EGRA(; f^; _g) is equal to the data complexity of GRA(; 2). Proposition 7: Let  be a decidable logical theory, admitting variable elimination and closed under complementation. Suppose that ^ 2 2. Suppose that the data complexity of GRA(; 2) is in class C. Then, the data complexity of EGRA(; 1) is in class C i NC is contained in C. Otherwise, it is in NC. 2 For example, from [23], [24] it follows that GRA(RP ; f^g) has NC data complexity when RP is the real polynomial constraint theory. From Proposition 7, it follows that also EGRA(RP ; f^; _g) has NC data complexity.

VII. External functions

The introduction of external functions in database languages is an important topic. Functions increase the expressive power of database languages, relying on user de ned procedures, without modifying the language de nition. External functions can be considered as library functions, completing the knowledge about a certain application domain. If we consider constraint query algebras, the introduction of external functions must preserve the closure of the language. The following de nition introduces a class of functions for constraint databases that satisfy this property. In the following, DOMgentuple(; ; S ) is the set of all the possible generalized tuples on  and , having (S ) as schema, where (S ) denotes the set of variables in S and S is a tuple of variables (denoted by [X1; : : :; Xn]).10 De nition 11: Let  be a decidable logical theory and  be a signature. An admissible function f for  and  is a function from DOMgentuple (; ; S ) to DOMgentuple (; ; S 0), where S and S 0 may be di erent. S is called the input schema of f and it is denoted by is(f ), whereas S 0 is called the output schema of f and it is denoted by os(f ). 2 When using external functions, two new operators, called application dependent operators11 can be added to 10 The de nition of S as a tuple simpli esthe de nitionof application dependent operators (see Table XIII). 11 The term application dependent operators comes from the fact

116

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. XX, NO. Y, MONTH 1999 TABLE XIII

EGRA(; F ) application dependent operators

Op. name

Syntax e

Restrictions

Semantics

r = (e)(r1 ) r = f[ (t)n (t )] (t) ^ t0 : t 2 r1 ; t0 = %[SjSr ] (f (t00)) t00 = %[SrjS ] ([ (Sr)] (t))gb

apply ATfSr (R1) f 2 F ; is(f ) = S; os(f ) = S 0 transformation (Sr)  (R1) card( (Sr)) = card( (S ))a (e) = ( (R1) n (Sr)) [ %[SjSr ](os(f )) set selection Cs f (R1 ) (e) = (R1) r = ft : t 2 r1 ; Cf (t) g 0

a card(s) returns the cardinality of the set s. bGiven two tuples of variables T and T 0 , %[T j

variable in T 0 .

T 0 ] (R) replaces in

EGRA():  The family of Apply Transformation operators allows to apply an admissible function to a generalized relation. Each operator of the family is speci ed by ATfSr , where f is an admissible function and Sr is a tuple of variables. The result of the application of ATfSr to a generalized relation r, whose schema contains (Sr), is a new relation obtained from the previous one by replacing each generalized tuple t by a new tuple t0 . The new tuple t0 is obtained from t by modifying the set of values assigned to variables in Sr, according to the application of function f .  The second operator (Application dependent set selection) is similar to the set selection of Table VIII; the only di erence is that now queries speci ed in the selection condition Cf may contain the operator ATfSr . Using the previous operators, we can now de ne the constraint algebra EGRA(; F ) De nition 12: Let  be a decidable logical theory, admitting quanti er elimination and closed under complementation and  be a signature. Let F be a set of admissible functions for  and . We denote by EGRA(; F ) the set of queries obtained by composing the semantics of application dependent operators presented in Table XIII and the semantics of EGRA() operators. 2 Example 8: To show some examples of queries using application dependent operators, we consider metric relationships in spatial applications. Metric relationships are based on the concept of Euclidean distance referred to the reference space E 2 . Since a quadratic expression is needed to compute this type of distance, metric relationships can be represented in EGRA(P ; F ) only if proper external functions are introduced. For example the following two functions can be included in F :  Distance: given a constraint c with four variables (X; Y; X 0 ; Y 0 ), representing two spatial objects, it generates a constraint Dis(c) obtained from c by adding a variable D which represents the minimum Euclidean distance between the two spatial objects (thus the input schema is [X; Y; X 0; Y 0 ], whereas the output schema is [X; Y; X 0 ; Y 0; D]). that functions re ect the application requirements.



R the i-th variable in T with the i-th

Bu er: given a constraint c, it generates the constraint

Buf (c) which represents all points that have a distance from c less than or equal to  (thus the input and output schemas coincide and correspond to [X; Y ]). A formal de nition of these functions can be found in [4]. Some relevant spatial queries using external functions are shown in Table XIV(A). In temporal applications, we believe that a \duration" function should also be inserted in the language. Note that the measure of the duration of an interval cannot be represented by D , since none of the mathematical operations are admitted in this theory. Therefore, in order to take into account the duration of an interval, the following external function has to be introduced in F :  Duration: given the interval t, it produces the distance Dur(t) on the axis of time between its starting point and its ending point (for example, the constraint (X  6) ^ (X  10) is transformed into (X = 4)). If t is a non-contiguous interval, the sum of the duration of all its intervals is produced. Input and output schema coincide. An example of temporal query using the function Duration is reported in Table XIV(B). 3 VIII. Conclusions and future work

Constraint databases use mathematical constraints to nitely model possibly in nite sets of relational tuples. In this paper, we have proposed a constraint relational model, based on the nested relational algebra, and investigated several related issues. The main novelty of our model, compared to other models, is the support for more general generalized tuples, the de nition of a set-based language, and the introduction of external functions. We have shown that the algebra is more suitable for end-users than the algebra proposed in [23], but it is equivalent to it when identi ers are introduced in input databases. From several examples, we have shown that the proposed algebra is well suited for spatial and temporal applications. An update language has also been developed, based on the same principles of the algebra [4]. Moreover, a set-based calculus extended with external functions and equivalent to the proposed algebra has also been proposed [5].

A. BELUSSI ET AL.: AN EXTENDED ALGEBRA FOR CONSTRAINT DATABASES

117

TABLE XIV

Examples of spatial and temporal queries in EGRA(P ; F ) and EGRA(D ; F )

Type DISTANCE QUERY SPATIAL JOIN

Type DURATION QUERY

Query

select all spatial objects in R that are within 50 Km from the object in2 S identi ed by the point pt 2 E generate all pairs (r; s) 2 R  S such that the distance between r and s is less than 40 Km, together with the real distance between r and s

Query

(A) Spatial Queries EGRA expression

[X;Y ] (cs (R 1 %[X jX ;Y jY ] (S 0 ))) [X;Y ] S 0  ATBuf ( (S )) 50Km P 0

0

[X;Y;X ;Y ] s ATDis (c (R 1 S 0 )) 0 S  %[X jX ;Y jY ] (S ) 0

0

0

0

(B) Temporal Queries EGRA expression

select the complete information about all trains standing at sta- cs (A) tion S for more than two minutes

Future work includes several issues, either related to the speci c model and algebra presented in this paper or, in general, to the use of constraint databases. 1. Issues related to the proposed model.  External functions. With respect to external functions, an interesting direction is the syntactic and semantic characterization of admissible functions. To this purpose, work done for introducing aggregation in constraint databases could be useful [12], [13], [26].  Logical optimization. The introduction of new set operators leads to the de nition of new equivalence rules for EGRA expressions. These new rules are important not only for the logical optimization of EGRA expressions, but, due to the equivalence between EGRA() and GRA() (modulo the use of generalized tuple identi ers), they can also be used to apply in one step complex rewriting to GRA expressions.  Indexing data structures. The presence of set selection operators in EGRA requires the use of index structures to support both containment and intersection queries. Spatial data structures with good average bounds, such as R-trees and their variants [19], [35], can be used for those purposes. However, other data structures having good worst-case complexity and scaling well to high dimensions have to be developed. See [7] for some preliminary results. 2. More general issues.  Canonical forms. In order to eciently perform algebraic operations and reduce the redundancy of the representation, generalized tuples should be represented using some canonical form. A canonical form for dense-order constraints and its impact on the de nition of generalized relational algebra have been discussed in [23]. Speci c canonical forms for linear generalized tuples should also be developed.  Cost-based optimization. In relational databases, information on data structures should be used after the logical optimization step to determine the optimal



[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11]

Conditions

P  Cpoint (pt) (P ) = fX;Y g c  ([X;Y ](t); [X ;Y ] (t); 16= ;) c  (Q1 (t);Q2 (t); 16= ;) [X;Y ] Q1 (t)  ATBuf ([X;Y ] (t)) 40Km Q2 (t)  %[X jX ;Y jY ] ([X ;Y ] (t)) 0

0

0

0

0

0

Conditions

P  (I  2) [X ] Q(t)  ATDur ([I ] (t)) c  (Q(t);P; 16= ;)

execution plan. A similar situation arises in spatial databases. Techniques applied in both kind of systems should be integrated to de ne a cost-based optimizer for constraint databases. Preliminary results on this topic can be found in [8]. Computational geometry algorithms. The use of constraints might sometimes simplify the execution of some spatial queries. For example, the intersectionbased spatial join can be computed on constraints by applying a satis ability check, without using any computational geometry algorithm. This new approach to process spatial queries has to be compared with the classical one, based on the use of computational geometry algorithms.

References S. Abiteboul, R. Hull, and V. Vianu, Foundations of Databases, Addison-Wesley, 1995. S. Abiteboul and P. Kanellakis, \Query Languages for Complex Object Databases," SIGACT News, vol. 21, no. 3, pp. 9{18, 1990. J.L. Balcazar, J. Diaz, and J. Gabarro, Structural Complexity II, Springer Verlag, 1989. A. Belussi, E. Bertino, and B. Catania, \Manipulating Spatial Data in Constraint Databases," in LNCS 1262: Proc. of the 5th Symp. on Spatial Databases, 1997, pp. 115{141. A. Belussi, E. Bertino, and B. Catania, \Introducing External Functionsin the Constraint Relational Calculus," In preparation, 1998. A. Belussi, E. Bertino, and B. Catania, \New Algebras for Constraint Relational Databases," Tech. Rep. 211-98, University of Milano, Italy, 1998. E. Bertino, B. Catania, and B. Shidlovsky, \Towards Optimal Two-Dimensional Indexing for Constraint Databases," Information Processing Letters, vol. 64, no. 1, pp. 1{8, 1997. A. Brodsky and Y. Kornatzky, \The LyriC Language: Querying Constraint Objects," in Proc. of the ACM SIGMOD Int. Conf. on Management of Data, 1995, pp. 35{46. J. Byon and P.Z. Revesz, \DISCO: A Constraint Database System with Sets," in LNCS 1034: Proc. of the 1st Int. CONTESSA Database Workshop, Constraint Databases and their Applications, 1995, pp. 68{83. A.K. Chandra and D. Harel, \Computable Queries for Relational Data Bases," Journal of Computer and System Sciences, vol. 21, no. 2, pp. 156{178, 1980. C.C. Chang and H.J. Keisler, Model Theory, North-Holland, 1973.

118

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. XX, NO. Y, MONTH 1999

[12] J. Chomicki, D. Goldin, and G. Kuper, \Variable Independence and Aggregation Closure," in Proc. of the 15th ACM SIGACTSIGMOD-SIGART Symp. on Principles of Database Systems, 1996, pp. 40{48. [13] J. Chomicki and G. Kuper, \Measuring In nite Relations," in Proc. of the 14th ACM SIGACT-SIGMOD-SIGART Symp. on Principles of Database Systems, 1995, pp. 78{94. [14] L. De Floriani, P. Marzano, and E. Puppo, \Spatial Queries and Data Models," in LNCS 716: Spatial Information Theory: a Theoretical Basis for GIS, 1993, pp. 123{138. [15] M. Gargano, E. Nardelli, and M. Talamo, \Abstract Data Types for the Logical Modeling of Complex Data," Information Systems, vol. 16, no. 6, pp. 565{583, 1991. [16] D.G. Goldin and P.C. Kanellakis, \Constraint Query Algebras," Constraints Journal, To appear. [17] S. Grumbach and J. Su, \Dense-Order Constraint Databases," in Proc. of the 14th ACM SIGACT-SIGMOD-SIGART Symp. on Principles of Database Systems, 1995, pp. 66{77. [18] R.H. Gueting and M. Schneider, \Realm-Based Spatial Data Types: The ROSE Algebra," VLDB Journal, vol. 4, pp. 243{ 286, 1995. [19] A. Guttman, \R-trees: A Dynamic Index Structure for Spatial Searching," in Proc. of the ACM SIGMOD Int. Conf. on Management of Data, 1984, pp. 47{57. [20] M.R. Hansen, B.S. Hansen, P. Lucas, and P. van Emde Boas, \Integrating Relational Databases and Constraint Languages," Computer Languages, vol. 14, no. 2, pp. 63{82, 1989. [21] L. Hermosilla and G. Kuper, \Towards the De nition of a Spatial Object-Oriented Data Model with Constraints," in LNCS 1034: Proc. of the 1st Int. CONTESSA Database Workshop, Constraint Databases and their Applications, 1995, pp. 120{131. [22] P.C. Kanellakis, \Elements of Relational Database Theory," in Handbook of Theoretical Computer Science, J. van Leeuwen, Ed., chapter 17. Elsevier Science, 1990. [23] P.C. Kanellakis and D. Goldin, \Constraint Programming and Database Query Languages," in LNCS 789: Proc. of the Int. Symp. on Theoretical Aspects of Computer Software, 1994, pp. 96{120. [24] P.C. Kanellakis, G. Kuper, and P. Revesz, \Constraint Query Languages," Journal of Computer and System Sciences, vol. 51, no. 1, pp. 25{52, 1995. [25] M. Koubarakis, \Representation and Querying in Temporal Databases: the Power of Temporal Constraints," in Proc. of the Int. Conf. on Data Engineering, 1993, pp. 327{334. [26] G.M. Kuper, \Aggregation in Constraint Databases," in Proc. of the 1st Int. Workshop on Principles and Practice of Constraint Programming, 1993, pp. 166{175. [27] J.L. Lassez, \Querying Constraints," in Proc. of the 9th ACM SIGACT-SIGMOD-SIGART Symp. on Principles of Database Systems, 1990, pp. 288{298. [28] L. Libkin and L. Wong, \Conservativityof Nested relational Calculi with Internal Generic Functions," Information Processing Letters, vol. 49, no. 6, pp. 272{280, 1994. [29] S. Marcus and V.S. Subrahmanian, \Foundations of Multimedia Information Systems," Journal of the ACM, vol. 43, no. 3, pp. 474{523, 1996. [30] J. Paredaens, \Spatial Databases, The Final Frontier," in LNCS 893: Proc. of the 5th Int. Conf. on Database Theory, 1995, pp. 14{31. [31] J. Paredaens, J. Van den Bussche, and D. Van Gucht, \Towards a Theory of Spatial Database Queries," in Proc. of the 13th ACM SIGACT-SIGMOD-SIGART Symp. on Principles of Database Systems, 1994, pp. 279{288. [32] J. Paredaens and D. Van Gucht, \Possibilities and Limitations of Using Flat Operators in Nested Algebra Expressions," in Proc. of the 7th ACM SIGACT-SIGMOD-SIGART Symp. on Principles of Database Systems, 1988, pp. 29{38. [33] P. Revesz, \Datalog Queries of Set Constraint Databases," in LNCS 893: Proc. of the 5th Int. Conf. on Database Theory, 1995, pp. 424{438. [34] M. Scholl and A. Voisard, \Thematic Map Modeling," in Proc. of the Symp. on the Design and Implementation of Large Spatial Databases, 1989, pp. 167{190. [35] T. Sellis, N. Roussopoulos, and C. Faloutsos, \The R+ -tree: A Dynamic Index for Multi-Dimensional Objects," in Proc. of the 13th Int. Conf. on Very Large Data Bases, 1987, pp. 507{518.

[36] D. Suciu, \Bounded Fixpoints for Complex Objects," in Proc. of the Int. Workshop on Database Programming Languages, 1994, pp. 263{281. [37] D. Suciu, \Domain-Independent Queries on Databases with External Functions," in LNCS 893: Proc. of the 5th Int. Conf. on Database Theory, 1995, pp. 177{190. [38] P. Svensson, \GEO-SAL: a Query Language for Spatial Data Analysis," in LNCS 525: Proc. of the Int. Symp. on Advances in Spatial Databases, 1991, pp. 119{140. [39] J. Van den Bussche, \Complex Object Manipulation through Identi ers - an Algebraic Perspective," Tech. Rep. 92-41, University of Antwerp, Belgium, 1992. [40] L. Vandeurzen, M. Gyssens, and D. Van Gucht, \On the Desirability and Limitations of Linear Spatial Database Models," in LNCS 951: Proc. of the Int. Symp. on Advances in Spatial Databases, 1995, pp. 14{28. [41] L. Wong, \Normal Forms and Conservative Properties for Query Languages over Collection Types," Journal of Computer and System Sciences, vol. 523, no. 3, pp. 495{505, 1996.

Alberto Belussi received the Laurea degree

in Electronic Engineering from the Polytechnic of Milan, Italy, in 1992 and the Ph.D degree in Computer Engineering from the same institute in 1996. Since 1992 he has been working at the Department of Electronics and Information Science of the Polytechnic of Milan. He is currently also Teaching Assistant at the University of Verona, Italy. His main research interests include constraint databases, spatial databases and geographical information systems.

Elisa Bertino is professor of computer science in the Department of Computer Science of the University of Milan where she heads the Database Systems Group. She has also been on the faculty in the Department of Computer and Information Science of the University of Genova, Italy. Until 1990, she was a researcher for the Italian National Research Council in Pisa, Italy, where she headed the Object-Oriented Systems Group. She has been a visiting researcher at the IBM Research Laboratory (now Almaden) in San Jose, at the Microelectronics and Computer Technology Corporation in Austin, Texas, at Rutgers University in Newark, New Jersey. Her main research interests include object-oriented databases, distributed databases, deductive databases, multimedia databases, interoperability of heterogeneous systems, integration of arti cial intelligence and database techniques, database security. In those areas, Prof. Bertino has published several papers in refereed journals, and in proceedings of international conferences and symposia. She is a co-author of the books \Object-Oriented Database Systems - Concepts and Architectures" 1993 (Addison-Wesley International Publ.), \Indexing Techniques for Advanced Database Systems" 1997 (Kluwer Academic Publishers), and \Intelligent Database Systems" forthcoming (Addison-Wesley International Publ.). She is or has been on the editorial boards of the following scienti c journals: the IEEE Transactions on Knowledge and Data Engineering, the International Journal of Theory and Practice of Object Systems, the Very Large Database Systems (VLDB) Journal, the Parallel and Distributed Database Journal, the Journal of Computer Security, Data & Knowledge Engineering, the International Journal of Information Technology. Elisa Bertino is a member of ACM, IEEE and AICA and has been been named a Golden Core Member for her service to the IEEE Computer Society. She has served as Program Chair of the 1996 European Symposium on Research in Computer Security (ESORICS'96), as General Chair of the 1997 International Workshop on Multimedia Information Systems, and as Program Co-Chair of the 1998 IEEE International Conference on Data Engineering (ICDE).

A. BELUSSI ET AL.: AN EXTENDED ALGEBRA FOR CONSTRAINT DATABASES

Barbara Catania is a Ph.D student in Computer Science at the Department of Computer Science of the University of Milan, Italy, since November 1993. She received with honour the Laurea degree in Computer Science from the University of Genova, Italy, in 1993. She has also been a visiting researcher at the European Computer-Industry Research Center, Munich, Germany, where she participated to the ESPRIT project IDEA, sponsorized by the European Economic Community. Her main research interests include: constraint databases, deductive databases, indexing techniques in object-oriented and constraint databases, and database security. She is also a co-author of the book \Indexing Techniques for Advanced Database Systems"1997 (Kluwer Academic Publishers). She served as PC member of the 1996 and 1997 International Symposiumon Applied Corporate Computing(ISACC'96, ISACC'97) and the 1998 IEEE International Conference on Data Engineering (ICDE'98).

119