ADL - An Algebraic Database Language 1 Introduction - CiteSeerX

1 downloads 0 Views 230KB Size Report
In this paper we present the language ADL ? an algebraic database query .... attributes of interest the relation can be nested again with the nest operator.
ADL - An Algebraic Database Language Hennie J. Steenhagen Peter M.G. Apers University of Twente Department of Computer Science PO Box 217, NL 7500 AE Enschede The Netherlands Abstract In this paper we present the language ADL ? an algebraic database query language

for complex objects. Non-administrative database applications pose new requirements on database languages with respect to modelling capability, deductive capability, and performance. To meet those new requirements, on the front end increasingly powerful database programming languages are being de ned. On the backend a lot of research is directed towards investigation of the opportunities multi-processor systems o er for database systems. ADL is designed as an intermediate algebraic optimization language to bridge the gap between those new programming languages, which often are of a declarative nature, and the increasingly complex system architectures. The language supports a rich collection of data types including set, list, tuple, variant, object and reference types. With these types, a collection of powerful, set-oriented algebraic operators is de ned. Deductive capability will be further increased by the possibility to de ne recursive queries. An important issue is the distinction between basic operators and operators introduced for optimization purposes.

1 Introduction

1.1 Motivation

Until now databases have been used mainly in administrative environments. Support for technical applications such as CAD/CAM or cartography, or for advanced oce applications such as document storage, consisted of ad hoc programs in a 3rd generation programming language, or lacked altogether. Current research in database systems is directed towards support for non-administrative applications. Such applications pose new requirements on database systems with respect to modelling capability, deductive capability, and performance. To meet those new requirements, research in database systems has taken several approaches in the past years. One of the rst approaches taken to increase modelling capability was the semantic data model approach, with its emphasis on semantic modelling constructs like aggregation, generalization/specialization, grouping, multi-valued attributes, and so on. The best known example of a semantic data model is of course the (Enhanced) Entity Relationship model [Elmasri89]. The complex object types of many current (object-oriented) data models can be seen as the `implementation' of semantic modelling constructs. With respect to deductive power, however, semantic data models often lack a proper, well-de ned query language. To increase deductive capabilities a lot of research e ort has been put into investigating the possibilities of logical query languages. Important features of logical query languages are their declarative nature and the good facilities they o er to write recursive queries. For example, the rst logical query language, Datalog, a logical interface language to relational database systems, 1

has grown out into the full-blown database programming language LDL [Naqvi89]. Also SQL-like languages are being extended with recursion to increase their deductive power. With respect to performance, research has been directed towards the investigation of the opportunities multi-processor systems (distributed or parallel) o er to support database systems, e.g. [Apers90]. In short, database query languages are becoming more and more powerful, high level, often declarative database programming languages, which must be compiled towards increasingly complex system architectures. Therefore the need for intermediate optimization languages grows, on the one hand because the distance between source and target language increases, and on the other hand because the high level, declarative nature of query languages requires much optimization. In this paper we present a new language called Algebraic Database Language (ADL), which is very well suited to serve as an intermediate optimization language for multi-processor database systems supporting high level database programming languages. ADL is an algebraic language for complex objects, and its function should be comparable to the function of the relational algebra in relational database theory and practice. We expect that logical languages like LDL and SQL-like languages such as HDBL [Pistor86b] can be easily translated into ADL. ADL is an algebraic language which 1) has a rich type system allowing for complex objects, 2) supports recursion, and 3) allows for ecient execution. Below these aspects are discussed in more detail.

Modelling Capabilities Current database design practice often consists of two steps. First

the Universe of Discourse is modelled by means of a semantic data model like the Enhanced Entity Relationship model [Elmasri89], or NIAM [Wintraecken86]. In the second step the semantic model of the Universe of Discourse is mapped to the data model of the database system in use ? often a relational system. This means that the complex (entity, relationship, object) types of the semantic model are attened, and dispersed over several tables in the relational system. To retrieve a complex object a number of (expensive) joins have to be performed, and to maintain integrity a lot of additional constraints must be introduced. Semantic modelling constructs that cannot be mapped directly to a relational system include composite attributes, multi-valued attributes or relationships, generalization/specialization, categories, sharing of values, etc. The insucient modelling capability of the relational model has long been recognized. The rst extension of the relational model was the NF2 model [Schek86], which allows for relation-valued attributes. A next extension was the XNF2 model [Pistor86a], which permits unconstrained nesting of (multi)set, list, and tuple types. To be able to handle complex objects, ADL o ers, besides standard atomic data types as integer, bool, and string, a number of complex type constructors. The ADL complex data types are set and list types, tuple and variant (or union) types, and object and reference types. Complex types can be constructed by unconstrained inductive application of these type constructors.

Deductive Capabilities Technical applications like CAD/CAM or cartography often involve

recursive structures (part-subpart database) and/or recursive queries (bill-of-material, nding a route between two cities). The database query language must o er means to handle recursion. Optimization of recursive queries in logical query languages has been studied extensively [Bancilhon89]. In a future version of the language we will include an operator like the -calculus operator of [Apers86] adapted to complex objects to be able to optimize recursive queries.

Performance From experience with the relational model we know algebraic languages are very

suitable for optimization [Apers90]. SQL-like languages and logical languages are declarative in the sense that they allow the user to specify what he wants without specifying how to obtain it. Algebraic languages are both declarative and procedural. Declarative in the sense that an 2

algebra allows for specifying the result in a constructive way by applying the operators without limiting the execution to the way the query is constructed. Procedural in the sense that, after rewriting the query and maybe adding operators for optimization purposes, the rewritten query can be interpreted as an execution schedule. This makes an algebraic language very suited to optimization. Another advantage of an algebraic language is the possibility of parallel execution of queries. The computational power required to support technical applications exceeds the limits of singleprocessor systems. Multi-processor systems, either shared nothing or with a shared memory, will in future support database systems. From our experience in PRISMA [Apers90] we know that a set-oriented language like an algebra is required to take full advantage of a parallel environment. Hence, ADL is a set-oriented algebraic query language o ering many possibilities for optimization.

1.2 Related work

In the past there have been several proposals for algebras for complex objects. Complex objects are characterized by the inductive application of constructors such as the set constructor fg, the tuple constructor [], and the list constructor <  >. At the lowest level these constructors are applied to atomic data types such as integer, real, bool, char, and string. The relational model only allows for objects of the form f[]g. The rst extension of this model was called NF2 (Non First Normal Form [Schek86]) and allowed for objects constructed by applying the set and the tuple constructor alternatively, starting with the set constructor. This comes down to allowing relation-valued attributes. The next extension, called XNF2 (eXtended NF2 , [Pistor86a]), allows for objects constructed by applying the above mentioned constructors in arbitrary order. Algebras for the NF2 model ? extensions of the relational algebra ? have been proposed for instance by [Thomas86], [Schek86], [Colby89]. In [Thomas86] special operators nest and unnest provide the possibility to operate on nested relations. If a query involves nested attributes a relation rst has to be attened with the unnest operator. After performing operations on the attributes of interest the relation can be nested again with the nest operator. The algebra of [Schek86] also includes the nest and unnest operators. Moreover, the relational operators select and project are rede ned to make it possible to access nested relations without rst attening them. In [Colby89] a recursive de nition of the relational algebra operators is given such that access to nested attributes is possible without making use of the nest and unnest operators. In [Beeri88] a model for complex objects is described in which set and tuple constructors can be applied in arbitrary order, i.e. they do not have to alternate as in the NF2 model. Also some new operators are described: replace, powerset, and set-collapse. The replace operator also appears in [Shaw89], where it is called image. The algebra of [Shaw89] is a proposal for an algebra for an object-oriented data model with abstract data types and object identity. The top level types are always set types, i.e. classes (collections of objects) are taken to be sets. The data types available in ADL include those of the systems above. Moreover list and variant types, and object and reference types are supported. Constructors can be applied in arbitrary order, and values of any type can be stored in the database. Some of the operators of ADL, e.g. apply, restrict, atten, etc. have been de ned elsewhere [Beeri88], [Shaw89]. However, to support the new data types, we have have de ned additional operators (or adapted existing ones), and we have also de ned some new operators for optimization purposes. In short, ADL can be characterized as a ? strongly typed ? algebraic query language o ering:

3

- a rich type system allowing for complex objects, - a collection of powerful operators, - recursion, - good opportunities for optimization. To gain experience with the language and to be able to judge design decisions we have implemented a prototype system in Prolog. Also a formal semantics of the algebraic operators of the language has been de ned [Steenhagen90]. The rest of this paper is organized as follows. In Section 2 we describe the ADL data types. In section 3 we will discuss the ideas underlying the de nition of the algebraic operators, which will be described in section 4. In Section 5 the programming language features of ADL will be shortly described, and in Section 6 we will give some examples and equivalence rules. In Section 7 we will give an outline of future work on the language and the system.

2 ADL types In this section we describe the structural component of the ADL data model, i.e. the data types supported by ADL. The algebraic operators that are de ned on these data types will be described in section 4. The atomic data types of ADL are int, real, bool, char, string, and oid. The type oid is a domain of system-generated unique values, which serve as object identi ers. Values can be drawn from this domain by calling the function new oid. These values can only be shown or be tested for equality. Starting from the atomic data types the following type constructors can be applied inductively: - The tuple type constructor []. If a1;    ; an are distinct labels and 1;    ; n are types then [a1 : 1;    ; an : n] is a tuple type. - The set type constructor fg. If  is a type, then f g is a set type. - The list type constructor <  >. If  is a type, then <  > is a list type. - The variant type constructor . If a1;    ; an are distinct labels and 1;    ; n are types then is a variant type. - The object type constructor obj. If  is a type, then obj ( ) is an object type. - The reference type constructor ref. If  is an object type (of the form obj () for some type ), then ref ( ) is a reference type. As in the XNF2 model the type constructors can be applied in arbitrary order (except for the reference type constructor). An example of an ADL type is the following complex type reports (see gure 1).

type reports = f[rep no:int,

authors:< [name:string, address:[street:string,nr:int,city:string]]>, title:string, descriptions:f[keyword:string, weight:int]g]g

4

reports

fg

[repno

authors

title

descriptions]

int

< >

string

fg

[name

address]

string

[keyword weight] string

[street

nr

city]

string

int

string

int

Figure 1: The complex type reports From the relational model we know set and tuple types are indispensable. List types are useful too ? lists are ordered and may contain duplicates. Variant types can be used when the type of an object may be one of several, exclusively. A postal address, for example, may be a home address or a postoce box number: type address = Object and reference types are included to allow sharing of values, and to allow representation of relationships by means of system-generated object identi ers. Object and reference types can be seen as `macro-types'; they are expanded as follows. type ot = obj ( ) is expanded to type ot = [id:oid,val: ], and type rt = ref ( ) is expanded to type rt = oid ( being an object type) This mechanism allows static typing of references (but does not support referential integrity). Most Enhanced Entity Relationship modelling constructs can now be mapped directly to ADL data types. Categories (entity types having two or more superclasses representing distinct entity types, like the owner of a car being a person or a company, [Elmasri89]) can be mapped to variant types, composite attributes to tuple-valued attributes, multi-valued attributes to set- or list-valued attributes, etc. Object and reference types allow sharing of values and representation of relationships by means of system-generated object identi ers instead of user-controlled keys. However, generalization/specialization cannot be mapped directly. See Section 7.

3 ADL Operators - Design Considerations In this section we will rst consider some general design principles, and then discuss somewhat more practical issues concerning the choice of operators.

5

3.1 Design goals

The speci c choice and de nition of operators, given the collection of data types, has been guided by the following general design principles.

Safety Which leads to typing. ADL is a strongly typed language. Each construct of the

language has a type and type checking takes place at compile time. Operators are type speci c, however, limited use has been made of overloading ? for example, for operators that apply to both list and set types the concept of overloading is employed.

Orthogonality The aim is to de ne a minimal set of operators which can be combined freely (taking into account type restrictions) to provide the expressive power needed.

Optimizability Which leads to set-orientation and the introduction of additional operators. Set ? (or list?)orientation The language should be set-oriented as much as possible because a

tuple-oriented language is less optimizable and o ers less opportunities for parallel execution. Additional Operators If a certain operator sequence occurs frequently, and if it is possible to de ne an operator, which has the same semantics as the original operator sequence, but which can be implemented much more eciently, then this should be done. A well-known example of this principle is the relational join, which is an optimization of a selection on a Cartesian product.

The goals of orthogonality (a minimal set of operators) and optimizability (introduction of additional operators) are con icting goals. Therefore, the aim is to make a clear distinction between 1) the basic operators of the language, i.e. the operators that are strictly necessary with respect to computational power, 2) operators added to the language to ensure performance, and 3) macros, i.e. operators added to the language for convenience. In table 2 the ADL operators have been listed according to the above classi cation. This table is not xed. The operator that is de ned as a macro at present, may be a `performance operator' in the future; operators may be added, or omitted, etc.

3.2 Practical considerations

Speci c choice of operators Our goal is to determine a minimal set of operators, given the collection of data types we want to support. We do not want to include redundant operators in our language, unless for optimization purposes. The basic abstract operations we do want to support are the following. - Creation. Constructors are supplied for this purpose. - Reconstruction, i.e changing structure. Changing structure can be done by removing type constructors (e.g. atten, element), by converting types (e.g. makeset, order by), or by introducing additional type constructors (e.g. group by). - Collection, i.e. combination of values. Collection may involve combination of values of the same type (e.g. plus), or of di erent type (e.g. product). - Extraction, i.e. selection of parts. Extraction may change the type structure (e.g eld selection), or preserve types (e.g. restrict). 6

- Testing. Of course boolean expressions are part of the language. - Choice, i.e. conditional operation. The operator cond is the ADL version of the if then else construct. - Recursion. At the moment only supported by the possiblity to de ne recursive procedures (see section 5), but in the future we will provide operator(s) allowing recursive queries. Furthermore we must provide means to access nested values. This is discussed below.

Accessing nested values ADL types can be nested arbitrary deep and we must have a way to access nested values. Set and list operators like plus, product, etc. only a ect the operand values as a whole ? the result of such operations contains direct copies of the members of the operand values. We might, however, want to apply operators to nested values, for instance apply a projection to a nested tuple, or a restriction to a nested set. In the past there have been several proposals to provide means to access nested values in complex object models, for example:

- Use operators unnest and nest. First atten a complex value, then perform the desired operation and then, if necessary, nest the result again [Thomas86], [Schek86]. - Allow algebraic expressions to appear at any place where attributes may occur in the at (relational) algebra [Schek86]. - Identify the values of interest by means of a certain path description [Colby89]. - De ne a dedicated `navigation operator'. This is the map functional, which is well-known from functional languages, and which appears under many di erent names in literature, e.g. replace [Beeri88], or image [Shaw89]. The rst three proposals are suited for NF2 models only, because set and tuple types are nested so neatly in those models. In more general models, like ADL, we do need a solution like the fourth, so we included the map functional in our algebra as the operator apply.

4 ADL Operators - Description Besides the operators that we will describe in this section, the following constructs are part of the algebra: denotations, boolean expressions, the ternary conditional operator cond, and the function new oid. Denotations are either basic constants or built up from expressions and value constructors [] (for tuples), fg (for sets), <  > (for lists), (for variants), obj (for object types) and ref (for reference types). Atomic predicates are built up by comparison operators and arbitrary algebraic expressions. Comparison operators include de ned on atomic types, set comparison operators in, subset, subseteq and =, and list comparison operators in, sublist, sublisteq and =. Compound predicates are built up from atomic predicates and the boolean connectives and, or, and not. The operator cond, taking three arguments, is the operator version of the if then else construct. The function new oid delivers new unique values from the domain oid, and is called by the system whenever object type values are being created. The collection of operators (listed in Table 1) can now be divided into four groups, according to their syntax. They will be described in the following sections. Note that many operators are de ned on both set and list type operands. 7

4.1 Unary Operators

The group of unary operators consists of the following. card is de ned on sets and lists and returns the number of elements (for lists duplicates are counted). atten applied on a set of sets or list of lists removes one constructor (and duplicates if necessary); element de ned on a oneelement set delivers that element. rst and rest are de ned on non-empty lists, having obvious semantics, reverse reverses a list, and makeset converts a list to a set (removing duplicates). Less informal: - card (fx1 ;    ; xng) = n (n  0; xi = 6 xj for all 0  i; j  n) - atten (S ) = fx j x 2 s ^ s 2 S g (S a set) - element (fxg) = x - rst (< x1 ;    ; xn >) = x1 (n  1) - rest (< x1 ;    ; xn >) =< x2 ;    ; xn > (n  1) - reverse (< x1 ;    ; xn >) = < xn ;    ; x1 > (n  0) - makeset (< x1 ;    ; xn >) = fy1 ;    ; ymg (m; n  0, m  n, fx1 ;    ; xn g = fy1 ;    ; ymg) In addition, we have operators id and val de ned on object types, delivering the object identi er and the value of the object, respectively.

4.2 Binary Operators

The binary operators are the following. plus, de ned on sets and lists (of the same type), means set union and list append, respectively. minus, also de ned on sets and lists, removes all values occurring in the second operand from the rst. intersect is de ned only on sets (for list type operands the result order is not clear). Below we will describe some other operators in detail.

Product The operator product delivers the Cartesian product of two sets or list type operands. If the operands of product are lists then the result is ordered. For sets we de ne:

product (fx1 ;    ; xmg; fy1;    ; yng) = f[c1 = x1; c2 = y1 ];   ; [c1 = x1; c2 = yn ],  , [c1 = xm; c2 = y1 ];   ; [c1 = xm ; c2 = yn ]g The labels c1 and c2 are system-generated. Alternatively, product could have two label parameters. Note that product applied to relations (sets of tuples) does not give the same result as the relational Cartesian product operator. The product operator from the relational algebra is an extended Cartesian product: the elements of the operand sets, which always are tuples, are concatenated in the result (so attribute names may not overlap). In our model this is not possible because the elements of the operands can be of arbitrary type.

Restricted product The rproduct is the complex version of the relational -join, a subset of the Cartesian product, and is de ned on set and list type operands. For sets we de ne: rproduct [p] (S1 ; S2) = f[c1 = s1 ; c2 = s2 ] j s1 2 S1 ^ s2 2 S2 ^ p (s1 ; s2)g Here p is a two-argument function delivering a boolean result, and c1 and c2 are system-generated labels which alternatively could be parameters of the operator. Operand types and values appear in the result unchanged. For example, gure 2 shows the result type R when the operator is applied to operands of type O1 and O2.

8

O1

O2

fg [name [ rst]

fg salary]

[name [ rst

age] last]

R

fg c1 [name [ rst]

c2 salary][name [ rst

age]

last]

Figure 2: The rproduct

Greatest lower bound join The glbjoin operator is the join operator of [Ohori88], and can

be considered as the complex version of the natural join from the relational model: the glbjoin joins two values if they have a part in common, and the common part is included into the result just once. For relations (sets of tuples of atomic values) the glbjoin is indeed natural join. The glbjoin is based on the existence of the subtype relation on types. The subtype relation, as de ned by [Cardelli84], can be roughly described as follows. A type  is a subtype of a type  if the type  contains at least the tuple elds of the type  , and possibly more. The result type of the operation is the greatest lower bound of the two operand types. This means that the operand types are merged to form a result type including all elds occurring in the operand types. Two operand values are included in the result if they match on the part of the type structure that the operand type have in common (if any). An example of the greatest lower bound join of two sets of tuples is the following. glbjoin (f[name=[ rst='Joe'],salary=21.000],[name=[ rst='John'],salary=21.000]g, f[name=[ rst='Joe',last='Doe'],age=21]g) = f[name=[ rst='Joe',last='Doe'],salary=21.000,age=21]g Note that the types of the operands do have a common part, i.e. f[name=[ rst=string]]g, and that those operand elements that have equal values on this common type structure are merged in the result. See gure 3. It must be remarked, however, that the usefulness of this join operator still has to be shown in practice.

4.3 Ordering Operators

The ordering operators are min, max, order by, and group by. These operators have as parameter a so-called path expression ? a sequence of labels separated by dots. This parameter is needed if we want to order a set of objects on a domain nested somewhere in the type structure of these objects. For instance, if we have the following type: 9

O1

O2

fg [name [ rst]

fg salary]

[name [ rst

age] last]

R

fg [name [ rst

salary

age]

last]

Figure 3: The glbjoin

type person = [name:string, age:int,

address:[street:string,nr:int,city:string]]

then we may want to order a set of persons by the city they live in. Ordering relations like ; , etc. are de ned only on basic types like int, string, etc., which are the leaf nodes of the type tree. We must use path expressions to indicate on which leaf node domain we want to order this set. Below we will shortly describe the various ordering operations.

Order by If we want to order the above set of persons by the city they live in we can do so

by writing: order by[address.city](persons). (In ADL arguments of operators not representing an ADL value (a value of an ADL type) such as e.g. path expressions, are put between square brackets. This syntax is also used in [Guting89].) The order by operator can be applied to a set or a list and delivers as a result a list of the elements ordered on the domain indicated by the path expression, in ascending order. To order a list or set in descending order we can next apply reverse.

Minimum=maximum These operators also can be applied to sets and lists and deliver as a result the set or list of elements which are minimal or maximal on the ordering domain.

Group by group by groups a set or list on the values the elements have on the domain indicated by the path expression. The result then is a set of sets or a list of lists. The above description of ordering operators assumes ordering can take place on only one ordering domain. Of course it is possible to allow ordering, or grouping, on multiple ordering domains, in which case we have multiple path expressions as parameters. Remark. It is not possible to perform ordering operations by means of the apply operator (and others) because the apply does not consider relationships between operand elements; the operator works on separate operand elements. 10

4.4 Parameterized Operators

Parameterized operators have, just like ordering operators, a parameter denoting a construct that does not (yet) represent an ADL value. The extra arguments of parameterized operators are, in most cases, functions. Such a function is written as a -expression, say x:e, and e, the body of this function, may then be an arbitrary ADL expression, in which the variable x is bound by the -abstraction. To select tuple components and apply operators to variant values, the dot operator and the case expression are provided. The operator apply performs application of the function parameter to all elements of the set or list operand. The operator restrict has as parameter a predicate (a function delivering a boolean result), and includes in the result those elements of the set or list operand that satisfy this predicate. In short: - [a1 = e1 ;    ; an = en ] : ai = ei - case < jai = ej > of a1 : f1 ;    ; an : fn = fi (e) - apply [f ] (S ) = ff (s) j s 2 S g - restrict [p] (S ) = fs j s 2 S ^ p (s)g For optimization purposes we have included the operators apply restrict, restrict apply, and compose. One simple observation, well-known from relational algebra optimization, is that one should try to do as much as possible during one set or list access. The operators listed below make it possible to perform, during one access, rst a restriction, and then an application (apply restrict), or rst an application, and then a restriction (restrict apply), or two applications (compose). In the de nition below compose has two function parameters, of course this can be easily extended to the general case of composition of n functions. - apply restrict [f; p] (S ) = ff (s) j s 2 S ^ p (s)g - restrict apply [p; f ] (S ) = fx j s 2 S ^ x = f (s) ^ p (x)g - compose [f; g] (S ) = ff (g (s)) j s 2 S g Also the operators choose, project, and rename are included for optimization purposes. choose [p] (S ) selects an arbitrary element from the operand set or list satisfying predicate p. If there is no such element a run time error occurs. The operator is included to be used in case there is only one element satisfying the predicate, or in case we are only interested in one such element. Only part of the operand set or list then has to be searched. Projection is realized by an expression like project [ ] (E ): If the type of the expression E is a subtype of the type  the projection is well-typed. The type of the expression E may be arbitrary. This is the project operator as de ned in [Ohori88] which also can be de ned in terms of application, eld selection, and tuple denotations. An example of a projection is (see gure 4 for parameter and operand type): project [f[name:[ rst:string]]g] (f[name=[ rst='Joe',last='Doe'],salary=21.000], [name=[ rst='John',last='Smith'],salary=21.000]g) = f[name=[ rst='Joe']],[name=[ rst='John']]g The operator rename is de ned on sets and lists of tuples, and has as parameters two labels, an old and a new one. The tuple eld carrying the old label is renamed with the new.

5 Programming Language Features To the language kernel, the collection of algebraic operators, some programming language features have been added. One may write an ADL program consisting of a DDL and a DML part. The DDL part consists of statements for schema de nition; the DML part consists of statements 11

O

P

fg

fg

[name] [ rst]

[name [ rst

salary] last]

Figure 4: Projection for the manipulation of ADL values. DDL statements are statements for type, procedure and variable de nition. Types, procedures and variables may be persistent or temporary. The de nition of database intension (types) has been separated from the de nition of database extension (ADL values). Deletion of persistent objects is possible. DML statements are retrieve statements and assignments. Expressions of the language are identi ers, denotations, operations, procedure calls, predicates, arithmetical expressions, and the function new oid. The constructs that add expressive power to the language are arithmetical expressions, and recursive procedures. A procedure has ? zero or more ? parameters and always delivers a result. The type of the parameters and result must be indicated in the procedure heading. The body of a procedure consists of an arbitrary algebraic expression. For an example of a recursive procedure see the following section.

6 Examples 6.1 Macros

A number of operations can be conveniently written as macros. - Let S be a set or list of tuples containing a eld named a, then S:a = apply[x:x:a](S ) - Let S be a set or list of values of object type, then val(S ) = apply[x:val(x)](S ) id(S ) = apply[x:id(x)](S ) - Let S be a set of values of object type, say obj( ) for some type  , and let R be a set of values of references to this type, then link(R; S ) = apply[x:val(x)](restrict[y:id(x) in R](S ))

6.2 Equivalence of Expressions

Expressions are considered to be equivalent if the result of evaluation is the same. (Equivalence is denoted by .) Some examples of equivalent expressions are the following (let S; S1 ; S2 be sets) : apply[f ](restrict[p](S ))  apply restrict[f; p](S ) restrict[p](apply[f ](S ))  restrict apply[p; f ](S ) apply[f1 ](apply[f2](S ))  compose[f1; f2 ](S ) 12

restrict[p1](restrict[p2](S ))  restrict[p1 and p2 ](S ) - Let S be of type f[last : string; age : int]g, then rename[last; name](S )  apply[x:[name = x:last; age = x:age](S ) project[f[last : string]g](S )  apply[x:[last = x:last](S ) - Of course many rewrite rules known from the relational algebra have their complex variants: restrict[p](rproduct(S1 ; S2))  rproduct(restrict[p ](S1); restrict[p ](S2 )) (p rewritten to p taking into account the extra tuple constructor) 0

0

0

6.3 Bill of material

An example of a recursive query (implemented by means of a recursive procedure) is the following (naive) computation of the transitive closure of a part-subpart relation (bill-of-material): type parts subparts = f[part nr:int,subpart nr:int]g

proc next(ps:parts subparts)parts subparts = begin

atten(apply[x. apply[y.[part nr=x.part nr,subpart nr=y.subpart nr], (restrict[z.(x.subpart nr = z.part nr))](ps)] end

(ps))

proc bill of material(ps old:parts subparts)parts subparts = begin var temp:parts subparts := next(ps old); cond (temp = ps old, end

ps old, bill of material(plus(ps old,temp)))

7 Future Work Future work involves the following issues:  Further investigation of optimization. Equivalence of expressions is a topic of research, and also the de nition, and ecient implementation of operator(s) to support recursion.  Investigation of the possibility to support object-oriented concepts in ADL, the most important being generalization/specialization (isa-relationships, inheritance).  Further re nement of the language and further work on our prototype implementation to test our ideas and concepts. Also work started on the modelling of a cartographic application will be continued.

Acknowledgements

We thank Henk Blanken, Bert v.d. Akker, and Frank Luisman for their contributions to this work.

13

References [Apers86] Apers, P.M.G., Houtsma, M.A.W., and Brandse, F., \Extending a Relational Interface with Recursion", Proceedings Advanced Database Symposium, Japan, August 1886. [Apers90] Apers, P.M.G., Hertzberger, L.O., Hulshof, B.J.A., Kersten, M.L., and Oerlemans, A.C.M., \PRISMA, A Platform for Experiments with Parallelism", submitted to IEEE Computer, 1990. [Bancilhon89] Bancilhon, F., and Ramakrishnan, R., \An Amateur's Introduction to Recursive Query Processing Strategies", in Readings in Arti cial Intelligence and Databases, eds. Mylopoulos, J., and Brodie, M.L., Morgan Kaufmann Publishers, 1989. [Beeri88] Beeri, C., \Data Models and Languages for Databases", Proceedings ICDT , Lecture Notes in Computer Science, Springer Verlag, 1988. [Cardelli84] Cardelli, L., \A Semantics of Multiple Inheritance", Semantics of Data Types, Lecture Notes in Computer Science, Springer Verlag, 1984, pages 51-67. [Colby89] Colby, L.S., \A Recursive Algebra and Query Optimization for Nested Relations", Proceedings ACM SIGMOD International Conference on the Management of Data, Portland, June 1989. [Elmasri89] Elmasri, R., and Navathe, S.B., Fundamentals of Database Systems, Benjamin/Cummings Publishing Company Inc., 1989. [Guting89] Guting, R.H., Gral: An Extensible Relational Database System for Geometric Applications, Proceedings VLDB, Amsterdam, August 1989. [Naqvi89] Naqvi, S., and Shalom, T., A Logical Language for Data and Knowledge Bases, Computer Science Press, New York, 1989. [Ohori88] Ohori, A., \Semantics of Types for Database Objects", Proceedings International Conference on Database Theory, Lecture Notes in Computer Science 326, Bruges, Belgium, August 1988, pages 239-251. [Pistor86a] Pistor, P., and Andersen, F., \Designing a Generalized NF2 Model with an SQLType Language Interface", Proceedings VLDB, Kyoto, August 1986. [Pistor86b] Pistor, P., and Traunmuller, R., \A Data Base Language for Sets, Lists and Tables", Heidelberg Scienti c Center, Technical Report, October 1985. [Schek86] Schek, H.J., and Scholl, M.H., \The Relational Model with Relation-Valued Attributes, Information Systems, Vol. 11, No 2, 1986, pages 137-147. [Shaw89] Shaw, G.M., and Zdonik, S.B., \Object-Oriented Queries: Equivalence and Optimization", Proceeedings 1st International Conference on Deductive and Object-Oriented Databases, Kyoto, December 1989. [Steenhagen90] Steenhagen, H.J., \Semantics of ADL", Technical Report, University of Twente, 1990. (To appear.) [Thomas86] Thomas, S.J. and Fischer, P.C., \Nested Relational Structures", Advances in Computing Research III, The Theory of Databases, ed P.C. Kannellakis, JAIpress, 1986, pages 269-307. [Wintraecken86] Wintraecken, J.J., NIAM in Theorie en Praktijk, Academic Service, 1986. 14

A ADL Operators operator card*

atten*

element rst rest reverse makeset id val plus* minus* intersect product* join* glbjoin min* max* order by

parameter

rst operand f g ff gg f g

second operand

f g f g f g fg fg fg f g f g f g

path expr. path expr. path expr. path expr. group by* path expr. f g . (dot) ai [   ; ai : i ;   ] case of fi : i !  restrict* f :  ! bool f g apply* f :! fg restrict apply* f :  ! bool, fg g:! apply restrict* f :  !  , fg g :  ! bool compose* f :  ! , fg g:! choose* f :  ! bool f g project   rename* ao ; an f[   ; ao : i;   ]g

obj ( ) obj ( ) f g f g f g f g f g f g

result

int f g

  f g

oid

 f g f g f g f[c1 : ; c2 :  ]g f[c1 : ; c2 :  ]g fg f g f g ff gg i  f g f g f g

f g fg   f[   ; an : i;   ]g

Table 1: ADL operators, parameters, operand and result types * In this line f g may be replaced by < > (all or none).

15

type object

level

basic

eciency

macro

obj val id reference ref link tuple [] dot variant case of set fg restrict apply dot plus apply restrict val restrict compose id apply choose link min/max rproduct group by rename order by element list restrict apply dot plus apply restrict val restrict compose id apply choose link min/max rproduct group by rename order by rest rst makeset any glbjoin project cond Table 2: Classi cation of operators according to type and reasons for introduction

16

Suggest Documents