PRACE IPI PAN ICS PAS REPORTS
Kazimierz Subieta, Catriel Beeri, Florian Matthes, Joachim W. Schmidt
A Stack-Based Approach to Query Languages 738
INSTYTUT PODSTAW INFORMATYKI POLSKIEJ AKADEMII NAUK INSTITUTE OF COMPUTER SCIENCE POLISH ACADEMY OF SCIENCES Warszawa, December 1993 1
Prace, zglosil Piotr Dembinski
Authors' addresses: Kazimierz Subieta Institute of Computer Science Polish Academy of Sciences Ordona 21 01-237 Warszawa, Poland e-mail:
[email protected] Catriel Beeri Hebrew University Computer Science Department Givat Ram, Jerusalem 91904, Israel Florian Matthes, Joachim W. Schmidt University of Hamburg Department of Computer Science Vogt-Kolln-Strae 30 D-2000 Hamburg 54, Germany CR: D.3.2, H.2.3
P r i n t e d as a m a n u s c r i p t N a p r a w a c h r e, k o p i s u ISSN: 0138-0648
2
Abstract We follow a new paradigm of programming languages in which imperative programming constructs and programming abstractions are built around a declarative query language. Seamless integration of queries with programming constructs implies a new approach to query languages, in which we employ the classical naming, scoping and binding issues. We de ne a simple abstract storage model, which makes possible to map modelling primitives of data models, including relational and object-oriented models. Our main concern are iterations encapsulated in the form q q , where q and q are arbitrary queries, and is an operator having some tradition in the QLs domain: can be selection, projection, navigation, join, quanti er, sorting, transitive closure, etc. Such operators are formally de ned by an abstract machine, which iteratively evaluates q in new environments, which are determined by tuples returned by q . The machine is based on two stacks: the result stack storing partial results of evaluation, and the environment stack, determining scoping and binding. The approach allows us in a consistent semantic frame to consider query constructs which resemble well-known approaches (tuple and domain relational calculi, SQL, and object- oriented query languages). It directly corresponds to the real implementation and it is already implemented in the system LOQIS. The approach supports complex objects, object identities, null-values and variants. Query operators can be seamlessly integrated with imperative constructs such as updating, for each, etc. We discuss procedures and functional procedures (views) based on queries, and object-oriented concepts. Finally, a new optimization method based on the proposed approach is presented. 1
2
1
2
2
1
3
Podejscie do jezyk , ow zapytan oparte o stos Streszczenie Artykul dotyczy nowego paradygmatu jezyk ow programowania polegajacego na tym, z_ e , , konstrukcje imperatywne oraz abstrakcje programistyczne sa, budowane dookola deklaracyjnego jezyka zapytan. Bezszwowa integracja zapytan z konstrukcjami programistycznymi , implikuje nowe podejscie do jezyk ow zapytan, w ktorym stosujemy klasyczne rozwiazania , , okreslane jako nazywanie, ograniczanie zakresu i wiazanie. De niujemy prosty, abstrak, cyjny model przechowywania obiektow, umo_zliwiajacy , odwzorowanie elementarnych poje ,c modeli danych, w szczegolnosci modelu relacyjnego i modeli obiektowych. Zajmujemy sie, glownie iteracjami ukrytymi w formie q q , gdzie q i q sa, dowolnymi zapytaniami, zas jest operatorem posiadajacym pewna, tradycje, w jezykach zapytan: mo_ze byc se, , lekcja,, projekcja,, nawigacja,, kwanty katorem, operatorem sortowania, operatorem tranzytywnego domkniecia, itp. Takie operatory sa, formalnie zde niowane poprzez maszyne, ab, strakcyjna,, ktora iteracyjnie oblicza q w nowych srodowiskach, wyznaczonych przez krotki zwrocone jako wynik q . Dzialanie maszyny opiera sie, na dwoch stosach: stos rezultatow, przechowujacy n, i stos srodowisk, okreslajacy , cze , ograniczenia za, sciowe rezultaty oblicze kresu i wiazanie. Podej s cie to pozwala nam, w semantycznie sp o jnych ramach formalnych, , rozwa_zac konstrukcje jezyk ow zapytan przypominajace , , dobrze znane podejscia (krotkowy i dziedzinowy rachunek relacji, SQL, oraz obiektowe jezyki zapytan). Jest ono bezposrednio , zgodne z rzeczywista, implementacja, i zostalo ju_z zaimplementowane w systemie LOQIS. Podejscie to uwzgledniania zlo_zone obiekty, identy katory obiektow, wartosci zerowe i wari, anty. Operatory jezyka zapyta n moga, byc bezszwowo zintegrowane z konstrukcjami im, peratywnymi, takimi jak aktualizacje, for each, itp. W artykule dyskutujemy procedury i procedury funkcyjne (wizje) oparte na zapytaniach, oraz pojecia zwiazane z podejsciem , , obiektowym. Na koncu prezentujemy nowa, metode, optymalizacji zapytan, oparta, na proponowanym podejsciu. 1
2
1
2
1
4
2
1 Introduction Recently the domain of query languages (QLs) is going into analogies and integration with programming languages and environments, since advanced database applications require sophisticated programming rather than simple querying. Attempts to combine querying with programming led to the impedance mismatch, which has undermined the meaning of QLs in database programming. However, as argued in [Beer89, SRH90, SRLG+90], QLs are the successful achievement of the database domain. For big and sophisticated applications QLs are imperative either for technical and ergonomical reasons. Declarative queries are much easier to optimize than sequences of procedural statements, and make possible parallel computations [BBKV87]. For some scale of complexity the automatic optimization almost always results in better performance than manual optimization done in a procedural language. Moreover, QLs based on data independence, conceptual data views and involving macroscopic operations allow for increasing the programmer's productivity and program reliability, readability and modi ability on the order of magnitude. In database systems QLs have two main r^oles. The rst is ad hoc interactive querying and updating of a database by less experienced users (see SQL, QUEL, Query-By-Example, QBF [Ingr89] and many others, e.g. [CMW87, PPT91, ZhMe83]). In the second r^ole, queries are used as high-level programming constructs, with various applications: fetching, updating, inserting and deleting database data, determining integrity constraints, determining views, snapshots, database procedures, scripts in 4GL-s, rules, active capabilities, subschema de nitions, access restrictions, etc. This paper is mainly devoted to the second r^ole of QLs. The popular classi cation distingushes between the embedded and integrated approach. SQL can be considered as an important example of embedded QLs. Embedding, however, inherently suers from esthetic, technical and ergonomical drawbacks. Majority of database programming languages (DBPLs) follow the integrated approach, for example, Pascal/R [Schm77], Galileo [ACO85], DBPL [ScMa92, MRSS92a], Napier88 [MBCD89], Machiavelli [OBB89], Taxis [MBW80], Adaplex [SFL81]), LOQIS [Subi91], O C [O2Ma92], etc. Also, commercial products such as Ingres Windows 4GL [Ingr90] and Oracle PL/SQL [Orac91] integrate SQL with imperative statements. There are two philosophies of integration of QLs with PLs. The rst one assumes a procedural PL with added-on QL constructs. Majority of current DBPLs follow this philosophy. The second one is just the reverse: a QL is the basis, and procedural constructs and programming abstractions are add-ons. This is implemented in Ingres Windows 4GL, in Oracle PL/SQL, in POSTGRES [SRH90], and in some AI developments [Mant91]. Advantages of the rst philosophy concern full computational and pragmatic universality, clean semantics due to the long tradition of PLs development, dealing with well-established programming abstractions (procedures, functions, ADTs, modules, classes, etc.), and the systematic treatment of the strong and static type checking. This is in contrast with the second line, which 2
5
main advantages are user friendliness, macroscopic programming, declarativeness, and data independence. As a matter of fact, although this line resulted in less "smooth\ PLs, it has enjoyed the big commercial success. In this paper we we would like to combine both philosophies. We build a foundation of a QL-centralized programming language according to the traditional paradigms of the PLs domain; the idea is called "seamless\ integration of a QL with a PL [Daya89]. The essence of our method is a modi cation of the classical PL methods and mechanisms aiming query processing. Before we start to present our idea we must explain a subtle point concerning relationships between data models and data structures. Data models, especially conceptual data models, address human understanding of data semantics on some abstract level. In contrast, QLs and PLs are strongly related to data structures; for example, some query maps data objects into elements of some other set, e.g. into fTRUE; FALSE g. The formal de nition of QLs and PLs is impossible without formalization of data structures which are to be queried or manipulated. Hence in our approach we distinguish a data model and an abstract storage model. The later to a big extent is orthogonal to data models. The same storage model can be used to map relational, nested relational, functional, entity-relationship, object-oriented, etc. data models. We start directly from the de nition of the storage model, assuming that there is a mapping from a particular data model into the storage model; as we will see, it is not dicult to develop informal principles of such a mapping. Then, our de nitions of QL operators address the storage model only; again, we assume that there is (or could be) an informal mapping of these operators into some QL concepts of a particular data model. This approach is illustrated in Fig.1.
Data model
Query in the data model
Interpretation of the result in the data model
Informal part of our approach Formal part
Query addressing the abstract storage model
Abstract storage model
Machine program
Query result
Figure 1: Relationships between data models and the abstract storage model Either QLs and PLs are based on the relationship between names occuring in language's statements and data structures. The relationship is called binding. An essential property 6
of the binding is locality of names. The scope for a name in PLs is restricted to some wellde ned context; for example, the scope of a variable local to a procedure. The scoping is also relevant to queries. For example, in embedded SQL queries may contain names of (volatile) host variables. In a QL integrated with a PL persistent and volatile variables have equal rights, thus in general they are the subject of scoping rules. Moreover, it is intuitively clear that e.g. names of attributes have a somewhat dierent scope than names of relations. Thus in our approach we put PLs and QLs into a common semantic frame referred to as naming-scoping-binding. The run-time mechanism of classical programming languages (at least, the Pascal family) is based on two stacks: the result stack, storing partial results of arithmetic and other expressions , and the environment stack, being the main memory allocation mechanism and used for binding of names occuring in a program. The stack-based approach to PLs is motivated by such requirements as orthogonality and unlimited nesting of language's constructs, locality of programming objects (variables, procedure parameters), and recursion concerning procedures and functions. The stack-based mechanism is also relevant to QLs. Even more, we believe that any serious development of a QL integrated with imperative constructs and programming abstractions must lead to the naming, scoping, binding and stacking issues. The de nition of QLs in the spirit of PLs requires, however, changes of the mechanism. Stacks assumed in classical PLs are not prepared to associative processing of bulk data structures, and to uniform treatment of persistent and volatile (complex) programming objects. In the paper we present an extended construction of the stacks (on the conceptual level) and de nitions of semantics of QLs' operators through operations on these stacks. Then, we show how the mechanism can be used to integrate query constructs with imperative statements and programming abstractions. The main motivation for the new approach to QLs is pragmatic universality . The pragmatic universality is frequently in opposition to conceptual simplicity and user-friendliness; achieving a good trade-o is perhaps the major problem of QLs. From one side, QLs are to be used by humans, thus should introduce features which are easy to understand, use and combine for a variety of programming situations. From another side, these features should form a minimal and consistent set, with formal (machine) semantics. There is a number of factors which contribute to the pragmatic universality of a PL based on a QL. Among them there are admissible data structures, introduced for various needs in programming and in conceptual modelling. Besides the classical relation, other (bulk) type constructors are considered: sets, bags, sequences, arrays, variants, etc. Modern DBPLs assume the possibility of orthogonal combination of all type constructors; this, in particular, leads to the NF concept and, in general, to arbitrarily complex data objects. They also assume orthogonality of types and persistence; in particular, bulk data may be stored as volatile variables and individual variables may be stored in the database. This implies a 1
2
2
The result stack is usually \hidden" in recursive de nitions of language's constructs. In this paper, however, our goal is to present the mechanism before any formalization of it. 2Note that frequently used term computational universality (in the Turing sense) is not much relevant to QLs, since the goal of QLs is not expressing a class of mathematical functions (as the Turing machine does), but serving data structures for all required purposes. 1
7
uniform treatment of all such structures by a QL. The pragmatic universality means a large collection of retrieval operators: selections, projections, joins, navigation down hierarchical objects, navigation via references, quanti ers, resolving name con icts, comparisons, arithmetic and string operators/functions, aggregate functions, grouping, ordering, transitive closures, etc. Then, the pragmatic universality requires a variety of imperative statements: creating new data objects, updating (assignment), inserting, deleting, if statements, loops, for each, etc. Current programming technologies require also a collection of programming abstractions | procedures, functional procedures, views, snapshots, classes, modules, etc. Usually programmers expect that procedures/functions may have parameters, can be recursive, and make possible to update data objects via their parameters and via side eects. QLs as PLs introduce a new quality: since queries can be considered as generalized PL expressions, it is natural to expect that queries will be used as actual parameters of procedures and will determine the output from functional procedures (views); the later property makes possible to use calls of functional procedures within queries. The above factors determining a level of the pragmatic universality we can confront with the current QLs theories, such as the relational algebra, relational calculus, predicate logic, and other. They had a big in uence on the development of QLs; however, the level of abstraction assumed in them causes diculties with the formal treatment of many mentioned above features. Recently the database domain has been in uenced by novel ideas | complex objects, object identity, classes and inheritance, roles, deduction, etc. | which cause growth of complexity of theories, especially if they extend the traditional relational or logic concepts. (An excellent survey of this line can be found in [Cruz89]). Thus theories frequently abstract from vital properties, such as updating. Data models and QLs, however, are getting more and more sophisticated, and their precise semantics can no longer be explained by the intuitive extrapolation of semantics of basic retrieval capabilities. Thus we would like to reconstruct the QL concept from the uniform PL perspective, which covers majority of phenomena that can be found in practical QLs and in theoretical issues. Although our orientation is pragmatic (a consequence of the implementation experience), and the presentation is semi-formal, it is not dicult to see how to make it fully formal. In this paper we de ne QL constructs via operational semantics based on an abstract machine. Declarative semantics can be obtained through building a denotational model; preliminaries of it are presented in [Subi85, SuMi86]. The approach presented in this paper is most of all relevant to object-oriented databases, but at the beginning we avoid direct associations with this idea. Our concept of QLs works for a slightly more general model than object-oriented, and we will show that a variety of object-oriented QLs can be obtained by specialization and modi cation of the presented de nitions. Our considerations to a big extent are orthogonal to data models: some topics considered in this paper are relevant to the relational model, extended relational models, nested relational models, entity-relationship models, functional models, and data models assumed in DBPLs. We did a big eort trying to simplify the presentation, but | unfortunately | the approach has little in common with known approaches to QLs. Some notions and examples may require from the reader a lot of imagination and concentration. The paper, however, is \self-content": we reduced as much as possible references to the PL literature. The reader, 8
who will be enough patient to follow through (we hope) not very sophisticated concepts, can enjoy the generality of our approach and, simultaneously, its correspondence to the real implementation. It enables us in one simple and consistent framework to consider queries in the SQL style, in the tuple calculus (QUEL, DBPL) style, in the domain-calculus style, and in object-oriented styles. It deals with selections, projections, navigations, quanti ers, very general joins, introduces a general variant of a transitive closure; it supports complex objects, references to objects, object identities, null-values, variants, updating, for each statements, (database) procedures, (updatable) views, etc. We will also show that the approach leads to a powerful query optimization method. The rest of the paper is organized as follows. In Section 2 we discuss the abstract storage model and present preliminary formal de nitions. In Section 3 we discuss the abstract machine model, introducing details of the operational semantics used for the speci cation of QL's constructs. In Section 4 we present and discuss various constructs of QLs (operators and comparisons, selection, projection, navigation, path expressions, join, quanti ers, calculus-like variables, transitive closure, ordering, null values and variants). In Section 5 we discuss programming constructs: assignments, for each statements, procedures and views, and discuss issues concerning object-orientation. Section 6 presents a new method of query optimization. The paper is nished by a conclusion.
2 An Abstract Storage Model The range of type constructors and data structures which have to be served by query languages is very wide. It includes individual variables, records (tuples), relations, sets, bags, sequences, arrays, complex attributes, repeating attributes, references (pointers), variants with and without explicit discrimination, null values, recursive data types, etc. In database programming languages, for example in DBPL [ScMa92], queries can combine access to persistent and volatile data structures. Object-orientation introduced more aspects: behavioral aspects, classes, object identities, structural and behavioral inheritance, and encapsulation. The large set of conceivable features of data structures causes growth of potential options, which are necessary to serve them. This spectrum of design choices causes problems for theoretical approaches, which need some pure and homogeneous, but general data concept. In our formal model of data structures we tried to achieve either some level of abstraction, minimality, and completeness. Before introducing the details of the model, let us note some features that we want to exclude, and some that we want to include. The model should not include a description of storage hierarchies and buer management, and details of physical organization and indices. We also want the model to be general, so that query languages for various conceptual models, from value-based to object-based, can be described in it. This calls for abstracting from the details of conceptual models. On the other hand, although we are not interested in details of physical organization, we do want to re ect what the user sees, in particular the fact that a in conceptual model an entity can include pointers to other entities | a basic feature of object-oriented models, that allows for implicit joins by using dot notation. The formal de nition of the binding process requires establishing a relationship between names occuring in a program (as seen by the programmer) and stored data the program 9
deals with. Classical semantics of imperative PLs assumes two name spaces. The rst space contains symbolic names invented by the programmer. Usually they have an informal descriptive r^ole supporting the conceptual view on the program; for example, DEPARTMENT, EMPLOYEE, SALARY, ACCOUNT, PART. The second space contains names which are internally used for identi cation of data. In the simplest case of assemblers they are physical addresses, in the denotational semantics they are "locations\, in relational databases they are tid-s, and in object-oriented databases they are object identities. Complex objects (for instance, records) associate external names with their components, which leads to nesting of the binding relationship. For bulk types there is a necessity to associate an internal identi er to each element of a bulk data structure, rather than one identi er to the whole, as for other structures, since it may be necessary to identify internally particular elements of a bulk structure. There are several candidate formal models of data structures re ecting both name spaces and covering the concepts of bulk data and complex objects. We propose one of them. Intuitively, a stored database, at a given point of time, consists of a collection of stored entities, which we call storage objects; these should not be confused with objects of a conceptual model. There are three components that we care about in a storage object: the value stored there, i.e., its content; its location or internal identi er; the external name invented by the programmer or by the database designer. Since we want to abstract away from physical organization, we represent internal identi ers abstractly and do not interested in their pragmatic nature; they can be physical addresses, symbolic addresses, object identi ers, primary key values, etc. The content of a storage object can be one of three kinds: an atomic value, for instance, 5, I am a string, etc. a complex value. We will represent a complex value as a set of storage objects. a reference to another storage object, i.e. its internal identi er. There is a number of aspects that we would like to abstract from in our de nitions. In particular, we neglect types. De nitions of query operators have very little to do with types; indeed, majority of current QLs are untyped. On the other hand, typed languages are a central research area in programming languages, and a successful facility for automatic program checking, especially important for large programs. Usually queries are not so large, but Cardelli warns [Card89]: \a surprisingly common mistake consists in designing languages under the assumption that only small programs will be written". We do not take a stand in this paper about whether query languages should be typed, and whether this is a prerequisite for a successful integration with programming languages. However, we realize that programs written in integrated QLs can be large, thus lack of strong typing may be the reason of low reliability and programmers' productivity. In the paper we will show another argument in fovour of typing. In some cases the untyped framework leads to semantic ambiguities 10
connected with scoping rules; thus a kind of typing information cannot be avoided in the proposed QLs. We would like to consider types after recognizing properties of QLs. This makes possible to discuss which of them can or cannot be discarded because of the typing system, and which typing systems is relevant for QLs . We also abstract from persistence. Assuming orthogonality of data structures and persistencs, QLs are independent on this feature. In particular, the variables de ned in a program or a procedure are also storage objects, albeit of a more temporary nature compared to those in the database. The notion of storage object thus contains both volatile and persistent variables. For the subject of this paper, the term "storage object\ seems to be more appropriate than traditional "variable\. In attempts to eliminate secondary features, we make uni cation of records, tuples, arrays, and all bulk structures; indeed, all of them are collections of elements. This has led us to the following simple (but suciently rich) de nition. Formally, let I be the set of internal identi ers, N be the set of external data names, and V be the set of atomic values. We do not assume any speci c nature of V ; in particular it may contain numerals, strings, texts, graphics, compiled procedures, and so on. Atomicity means that we do not assume the existence of operations that are known as referring to their parts. We assume I \ V = ;. N \ V needs not to be ;. It is possible, even common, that programs compute some values, then use them as names, e.g. integers that are used as array indices. A storage object is a triple < i; n; v >, where i 2 I; n 2 N; are its identi er and name, respectively, and v is its value. We often say that i identi es this object. The value can be one of the following: An atomic value from V ; An identi er j from I . This identi er, as the value of the storage object, serves as a (logical) pointer to another storage object. A set of storage objects. We refer to these three types of objects as value-objects, pointer-objects, and set-objects, respectively. The rst two kinds are also called atomic, and those of the third kind are called complex. Below is an example of a complex storage object. 3
< i ; EMP; f< i ; NAME; Smith >; < i ; SAL; 2000 >; < i ; WORKS IN; i >g > 5
6
7
8
17
A database instance is a set of storage objects. We assume that it satis es some obvious constraints. An element i 2 I is used in it at most once as an identi er: there is one-to-one correspondence between storage objects and identi ers. If an identi er is used as a pointer, then it also identi es some object (the referential integrity). The above de nition is not unique. Note also that we could adopt an approach where instead of a set of storage objects we would allow only a set of identi ers in the last case. Further generalization of this idea leads to the model presented in [Subi85, SuMi86, SuRz87], 3
We have already some ideas concerning typing of QLs; they will be the subject of a subsequent paper.
11
where a database instance is a relation being a subset of I N (V [ I ). Our approach allows for a somewhat more direct modelling of database objects as bounded units, and makes easier the de nition of semantics of updating operations, e.g. delete. We do not consider potential extensions of the storage object concept, for example, objects without a name, and objects with more than one name (as we will see in the following, for some purposes the later extension is reasonable). We emphasize that the identi ers in I are internal identi ers. They are not used in queries or programs and are not printable. Consequently, a one-to-one mapping of all identi ers to another collection of identi ers (a permutation of the identi ers) cannot be recognized from outside. A database instance is a representative of the class of databases that can be obtained from it by one-to-one identi er mappings. In contrast, the names are external in the sense that they are used in queries; they are a part of the user's model. There is no requirement of uniqueness on names; as seen clearly in our examples, this freedom allows for representing bulk data. From now, "object\ always means storage object, unless speci cally noted otherwise. Below we present an example database instance. Note that each storage object has a unique identi er and there are no dangling pointers.
Example MyDatabase:
< i ; EMP; f< i ; NAME; Brown >; < i ; SAL; 2500 >; < i ; WORKS IN; i >g >; 1
2
3
4
13
< i ; EMP; f< i ; NAME; Smith >; < i ; SAL; 2000 >; < i ; WORKS IN; i >g >; 5
6
7
8
17
< i ; EMP; f< i ; NAME; Jones >; < i ; SAL; 1500 >; < i ; WORKS IN; i >g >; 9
10
11
12
17
< i ; DEPT; f< i ; DNAME; Toy >; < i ; LOC; Paris >; < i ; LOC; London >g >; 13
14
15
16
< i ; DEPT; f< i ; DNAME; Sales >; < i ; LOC; Berlin >g > 17
18
19
In Figure 2 we illustrate this example graphically. Having de ned the model, let us now consider how it can be used to represent databases that are given in one of the common conceptual models. Let us start with the relational 12
MyDatabase i1 EMP i2 NAME Brown
i5 EMP i6 NAME Smith
i3 SAL 2500 i4 WORKS_IN i13
i9 EMP i10 NAME Jones
i7 SAL 2000
i8 WORKS_IN i17
i11 SAL 1500 i12 WORKS_IN i17
i13 DEPT
i17 DEPT
i14 DNAME Toy
i18 DNAME Sales
i15 LOC Paris
i19 LOC Berlin
i16 LOC London
Figure 2: Nesting of storage objects model. In this model, when we want to deal with employees, we may have a relation called EMP . This name is thus associated with the relation | the set of tuples. However, we are interested in names in the context of binding. Consider a typical query like "select SAL from EMP\. During execution of this query, the name EMP gets bound to each tuple of the relation. Therefore, in our model, the name EMP is associated not with the relation, but rather with its tuples. In the example above, if in the object WORKS IN we store department names (instead of pointers) and reduce the possibility to repeat LOC attributes, we obtain a relational database. Now, consider the representation of tuples. In the example, we have ve tuples | three employees and two departments. We rst note that each tuple is a storage object, it has an identi er. This identi er has nothing to do with the conceptual model, it is essentially an abstract representation of the internal identity of the tuple. (Many relational systems indeed use internally tuple-id`s.) Also note that the tuple`s value is a set of storage objects. Each of these represents one attribute value within the tuple. Note that the attribute is simply a name that, in the context of this tuple, is bound to a certain stored value. The same attribute also appears in the other tuples of the relation. Finally note that we do not have tuple as a data structure in our model | a tuple is represented by the set of its components. Indeed, for our task, a tuple is simply a local environment | a set of names and a stored values that can be bound to them. The concept of a tuple as a data structure, and the associated constraint that all tuples (in a given relation) have the same attributes, are irrelevant in our 13
context. To generalize to the nested relational, or complex object models, we need to be able to represent sets, and obviously we have that in the model. Note that we do not have the possibility of a set of values as the value (i.e., third component) of an abject. While this possibility has advantages for conceptual modeling, it would violate the storage principle [SuMi86] saying that each stored value, that can be distinguished in the structure, should have an unique internal identi er. This principle is strongly motivated by engineering requirements. Thus we view a set as a collection of stored values, and each of these is represented as a storage object. Also note, that just as for relations, if the set has a name in the conceptual model, this name is associated with each element of the set. We can easily represent lists. For example, a list is a set of storage objects
< i ; N 1; f< i ; N 2; value >; < i ; N 3; i >g > 1
2
3
4
where each pointer i , leads to a next list element; the last element does not contain the pointer. Similarly one can represent trees, and generally, values of recursively de ned data types. Our storage model allows for representing stored bags. For example, the storage object 4
< i ; N; f< i ; M; 5 >; < i ; M; 5 >g > 1
2
3
represents the bag f5; 5g. Names can be numbers, for example,
< i ; A; f< i ; 1; Monday >; < i ; 2; Thusday >; ::; < i ; 7; Sunday >g > 1
2
3
8
In the terminology of programming languages such a structure is called array. Note that the names used in arrays are also values, and are routinely calculated during run-time. In comparison to other data structures, e.g. to records, arrays have the only essential property: names of their elements can be calculated during run-time (they are rst-class citizens). This property, however, is not essential for our presentation. In LOQIS each data name is rst-class; we assumed that the construct [e] denotes a data name obtained by evaluation of expression e; for example, if x = 5 then [x + 2] denotes name 7, and A:[x + 2] identi es the Sunday element of the presented array. In consequence, the dierence between arrays and other type constructors is neglected. We have already stated that we do not impose the constraint that objects with same name have the same structure. This allows us to represent variants, as in the following database fragment, again without having this concept in the model:
< i ; EMP; f< i ; NAME; Brown >; < i ; SAL; 2500 >g > < i ; EMP; f< i ; NAME; Smith >; < i ; IS STUDENT; true >g > 1
2
6
7
3
9
The existence of a eld to serve as a discriminator of a variant is not obligatory. For the same reason, we can easily represent null values, when they stand for lack of 14
stored information. The main reason for dealing with null values is that a particular information does not t exactly the prede ned format. Because the relational model is havily based on prede ned formats of tuples, the problem of null values received a big attention. Null values cannot be avoided in real-life database systems, thus cannot be avoided in QLs. Since the de nition of database instances does not include any concept of format, such a null value is represented simply by the absence of information. For example,
< i ; EMP; f< i ; NAME; Brown >; < i ; JOB; programmer >g > < i ; EMP; f< i ; NAME; Smith >g > 1
2
6
7
3
means that JOB for Smith is null-valued. Note that we do not associate any semantics with this absence. As far as we are concerned the above example, Smith may have no job at all, or his job may be unknown, or his job is determined in another way. For example, assume that almost all employees are clerks; thus the JOB information is explicitly written only for these, who are not clerks. The interpretation of the absence of information belongs to the realm of conceptual modeling, and is outside of the scope of our model. Similarly, if the conceptual model stores nulls as special values (marked nulls), these will be simply storage objects for us. Of course, we will need the ability to test for the lack of a stored objeect with a given name, so that the query evaluator can interpret this and apply the appropriate semantics, as determined by the conceptual model. In summary, this simple model allows one to represent a variety of data structures and concepts, including records/tuples, arrays, (nested) relations, sets and bags, and their combinations; variants, null values; (complex) objects, object sharing, and instances of recursive types. As shown in Fig.1, our model is only a formal tool which makes possible to represent conceptual modelling primitives from a particular data model. In many cases this representation is quite strightforward; this concerns the relational model and NF models. For semantic models such as the entity-relationship model, functional models, IFO, etc. there are several methods of representation of their primitives. For example, an instance of a binary relationship can be represented as two pointer objects inserted into entities to be connected, as a special object containing two pointer objects, or in another way. The main advantage of the model is the level of abstraction, which allows us to explain QLs properties without going into details of various special cases. Some issues, in particular, generalization/specialization relationships, object-oriented concepts, and ordered bulk data (sequences), require further features of the model, without violating basic assumptions. Some extensions will be discussed later. 2
3 An Abstract Machine Model In this section we present an abstract machine model, on which query language expressions can be evaluated. A component of a state of the machine is a database instance represented in the storage model, as described in the previous section. We describe here other components the the state, that is, structures used for computation and for query evaluation. These include an environment stack, and a query result stack. The environment stack, as usual, determines scoping and binding. The result stack is a storage for intermediate query 15
results, used either for the evaluation of query operators and for the evaluation of arithmeticstyle expressions (we do not make distinction between these cases). We discuss some basic commands of the machine, and its facility for parallel execution.
3.1 The Environment Stack
When a query is evaluated, we need rst to evaluate its atomic components: names, constants, etc.; then to build the meaning of larger constructs (arithmetic expressions, subqueries, queries) from the meaning of its components. The evaluation of names means binding them to data units. Such a binding depends on the context. Just as the denotation of identi ers are dependent on the procedure in which they appear, the denotation of attributes in a select list depend on the bindings of relation names listed in the from clause. As a simple example, consider a query in a relational database, that asks for the names of employees satisfying some condition
select NAME from EMP where . . . Semantically, the query implies a loop, where the binding to EMP is iteratively changed; each binding for NAME is determined by a previous binding to EMP. For each of these, NAME is bound to the appropriate subobject that corresponds to the NAME attribute value for that employee. While bindings for simple relational queries, such as the one above, seem to be obvious, this is not the case for more complex queries. For example, in a query that selects the salaries of employees that earn more than their managers, SAL is used three times, and there is a need to de ne how each occurence is bound. The problem is agravated in models that, unlike the relational model, admit deeply nested structures, possibly, with repeated (sub) attributes. The binding to a name (i.e. its meaning) then may depend both on the position of the identical name in the data structure, and its use in the query. Using the terminology of programming languages, the issue here is essentially that of scope rules. To deal with it, we propose a mechanism that is well known in programming languages, namely that of an environment. In typical PLs, an environment is an association of names and objects, and we have re ected that in our de nition of the storage model. Because for various reasons we need to restrict the scope of a particular name occuring in a program, the environment is subdivided into parts (called sections), which form a stack. Binding the name implies a procedure, which looks for a proper object, starting from the top of the stack, and ommiting irrelevant stack sections. In classical PLs the search is done during the compilation time, thus the stack exists in two versions: during compilation (socalled static environment) and during run-time. The run-time stack consists of object values only, since after binding explicit names need not be stored. Following PLs, we represent the environment by a stack, which we call the environment stack, and denote ES . In comparison to PLs, we have several reasons to change its construction. Because we would like to abstract from compilation, and because of the late binding, we need to have full information about data structure and data names during run-time, and this is already taken into account in the storage model. In classical PLs a stack section has a 16
xed format during run time, but this is not the case for QLs. For example, bulk data (relations, repeated attributes), text or multi-media data lead to a variable formats. The stack is usually a main memory structure, while the data are stored at secondary storage; this means that some data at the stack must be represented by pointers to the secondary storage. The last reason for the change of the stack construction ultimately leads to the pointer variant: storage objects can be shared between dierent stack sections. Indeed, for example, a stack section may contain a storage object EMP, but during evaluation of some query we need to build another section (a local environment) with the storage objects NAME, SAL and WORKS IN being attributes of the EMP object. Therefore for uni cation we assume that storage objects are stored in some pool independent of the stack, and the stack stores only pointers to them. Hence, each stack section is a set of data identi ers. The structure is illustrated in Fig.3. Volatile Objects (local to procedures, modules, ...)
The environment stack
i129 Z ...
i127 X ...
Top i129 i130
i128 Y ...
i127 i128
i130 T ...
...
...
MyDatabase
i2 i3 i4 i1 EMP
...
...
i1 i5 i9 i13 i17
i2 NAME Brown
Bottom
i5 EMP
i9 EMP
...
...
i3 SAL 2500
i4 WORKS_IN i13
i13 DEPT ...
i17 DEPT ...
Figure 3: Storage objects and the environment stack The environment stack is presented on the level which supports the conceptual clarity and uniformity. We abstract from the methods aiming performance, easy programming and reliability. In particular, in implementation some objects can be stored directly at the stack, 17
which reduces a level of indirection. The run-time search in the stack can be partly avoided since a section of the stack which is relevant for a particular binding can be determined during compilation. Since usually the stack is a main memory structure, while data are stored at a secondary storage, it may be reasonable to introduce redundancy: the stack stores pairs < name; identifier > rather than the identi ers alone. This makes possible to avoid many disc operations. In LOQIS we assumed some abbreviations in representation of identi ers; in particular identi ers of all objects that are subordinated to object having identi er i are represented by i with a special ag. Some additional pointers, leading from a section to the next section to be visited during the search, can also improve performance. In LOQIS we avoid direct representation of sets of identi ers of objects having the same name, for example, identi ers of all employees or all departments. This is implemented as an additional indirection level in the data structure. We assume that at the beginning of the evaluation process the environment stack consists of one section containing identi ers of all database \records" i.e. objects belonging to the top hierarchy level in the database instance, as EMP and DEPT in MyDatabase. (As noted above, simple methods make possible to avoid long lists of identi ers.) Some other assumptions were tested in LOQIS. For example, in the entity-relationship model there is a concept of weak entities, which exist only together with their super-entities (e.g. children of an employee). Removing an entity implies removing all week entities subordinated to it, but for retrieval week entities behave as ordinary entities. Such an entity can be modelled as a sub-object of another object, but its identi er is included into the initial section of the environment stack. We also tested the situation, when the initial section contains also identi ers of relationships; this leads to somewhat dierent pragmatic rules of the query language. In real systems the structure of data repositories and their behaviour can be complex, thus the rules for the initial lling of the environment stack may be more sophisticated. In summary, our proposal changes the environment stack as used in programming languages in two major ways: (1) The stack contains pointers to storage objects rather than the objects itself; (2) A search necessary to bind a name occuring in a query/program may return multiple bindings; i.e. many objects can be bound to a single name. This is intimately related to the parallelism inherent for the semantics of queries.
3.2 Binding and Opening a New Scope
Binding a particular name n occuring in a query implies a search for object(s) named n in the environment stack. The search follows scope rules. For the typical case they are as follows. The search starts from the top of ES , and it is terminated after the object(s) are found or the bottom of the stack is reached. The name is bound to all objects having the given name and which identi ers are in the ES section where the search has been terminated. Note that all objects named n from one ES section are bound to the name n. The scope rules must be sometimes more sophisticated because of various locality concepts in PLs. For example, if procedure p calls procedure p , then local objects of p should not be visible during the binding of names occuring in p . This means that the ES section containing identi ers of local objects of p should be ommitted during the search. Further rules are the consequence of nested program blocks, modules (distinguishing speci cation 1
2
2
1
18
1
and implementation objects), ADTs, classes and inheritance, viewers [SMSRW93], speci c methods of parameter passing in procedures, and perhaps, other techniques. We discuss these problems in more detail latter. While a search may go into the stack ES , updates of the stack can be only performed at the top. We allow only the traditional operations on a stack, namely push(s), and pop, where s is a section. Since a section represents a scope, they correspond to opening a new scope and closing a scope, respectively. In programming languages, opening a new scope corresponds, e.g., to an activation of a block or a procedure. In a query language, it corresponds additionally to the need to evaluate a query component in a context determined by another component, and is related both to the structure of the query and the structure of the data. For example, in the query N :N , each possible binding for N is determined by a binding for N . That is, rst N is bound, in general to many identi ers i ; i ; :::; ik. Each such identi er ij de nes a scope, consisting of identi ers of objects that are nested in the ij object. The semantics of the dot operation is that the binding for N is iteratively determined in these scopes. As another example, the semantics of the query RS , where is some join operation involving names of attributes and a comparison, can be described using the nested loop approach by the following procedure: (1) Determine a binding for R; this is an identi er of a tuple (tid) of the relation R. This binding determines a new scope with bindings for attributte names of R. (2) Using the same environment, determine a binding for S (i.e. a tid of tuple of S ). (3) Evaluate the condition in an environment where these two scopes are the top ES sections; if it is true, compute the result tuple and add it to the join result. Both the condition and the result are de ned in terms of attribute names of R and S , and the bindings for these names should therefore be found at the top two sections of the stack . Let i be an identi er. We denote by nested (i) the following set of identi ers: if i identi es a set object, then nested(i) contains all identi ers of the objects in the set. If i identi es a pointer object < i; n; j >, then nested(i) = fj g. For uniformity we assume that if v 2 V then nested(v) = ;. Then, we upgrade the function to arguments being sets/sequences of identi ers: the result is an union of partial results. For example (see MyDatabase), nested(i ) = fi ; i ; i g, nested(i ) = fi g, nested(2500) = ;, and nested(< i ; i >) = fi ; i ; i ; i ; i ; i g. Formally, the function nested has also an (implicit) argument being the database instance; for abbreviation we ommit it. In this paper our main concern are queries of the form q q , where q and q are atomic or compound queries, and is an operator having some tradition in the QLs domain. can be where (selection), \." (projection, navigation), ./ (a variant of join), 8 and 9 (quanti ers), order by (sorting), closed by (transitive closure), etc. In our approach all these operators are de ned by application of the same formal mechanism. The idea of the mechanism relies in the iterative evaluation of q in new environments, which are determined by tuples returned by q . Thus q returns as many results as the number of rows returned by q . The nal result of q q is some combination of the result returned by q and the results returned by q . The r^ole which we assume for the function nested is the following: for evaluation of 1
2
2
1
1
1
2
2
4
1
2
3
4
2
14
15
3
4
4
13
1
13
16
1
2
1
2
2
1
2
1
1
2
1
2
4
Note that relational structures can be easily modelled in our model, if we assume that a tid is a pair
relation name, primary key value(s) >, and an attribute value identi er is a pair < atribute name, tid>.
1800). Indeed, queries of SBQL are a generalization of PLs' expressions; e.g. 2 2 is a query. We assume full orthogonality of operators, i.e. they can be used in parenthesized expressions to any level of nesting. To improve readability, we avoid some parentheses according to the typical precedence rules for arithmetic expressions. In many cases we apply the rule saying that the evaluation is performed from left to right. For example, a:b:c:d:e should be understood as (((a:b):c):d):e. We also assume that the dot operator has the highest precedence, and we usual omit parentheses around a predicate written after the operator where To deal with semantics of SBQL we introduce the procedure eval. There are two views on this procedure, denotational and operational. In the denotational view, eval is is a function eval : SBQL ! (DBI ES ! , nested0(i ) = nested(i ). The advantage is that the model is apparently more conceptual, and queries are shorter. A disadvantage concerns updating of references, which is a useful feature in real systems, see example ChangeDept in the next section. To update a pointer object, we must return its identi er rather than the identi er being its value. This makes the necessity to make distinction between the output from (EMP where NAME = "Smith"):WORKS IN and from (EMP where NAME = "Smith"):WORKS IN:DEPT We can also apply a combined solution, assuming the function nested00(r) = nested(r) [ nested0(r). In this case we obtain all mentioned capabilities, but such an idea introduces some ambiguity and may be more dicult for a typing system. We underline here that (in contrast to other formal approaches) our framework makes possible to consider such semantic details. 1
1
2
2
4.8 Navigational Join
The join de ned through the cartesian product followed by a selection allows us to create pairs of identi ers of objects that satisfy some condition. A case of particular interest is the navigational join, where the result is, again, a set of pairs, but now the second object in each pair is reachable by some path (i.e. by using `dot') from the rst. For example, we might want to create a set of EMP and DEPT object pairs that represent the WORKS IN Followers of functional approaches to databases may use the syntax ( ) instead of with standard functional notation. 6
n q
33
, for consistency
q:n
relationship. Such a query can be written by a product followed by a selection (assuming available some additional operators). However, it is useful to have a more direct expression that navigates via WORKS IN rather than uses a product. Thus we modify the de nition of the dot operator that traverses a link and returns both endpoints. Acording to our de nitional pattern for eval, we present below a de nition which (as we will see) covers a more general case. The syntax is q ./ q . Let r denote a single-row table obtained from the row r, and let symbol denote the \horizontal" composition of bags \each row with each row" (it is a natural generalization of the cartesian product). Semantics of the construct is determined by the following part of the eval procedure: 1
2
procedure eval( query: string); begin
... if query is recognized as q ./ q then 1
2
begin var RESULT : Table; RESULT := ;; eval(q ); 1
for each r 2 top(QRES ) do begin
push(ES; nested(r)); (* Open a new scope on ES *) eval(q ); RESULT := RESULT t (r top(QRES )); pop(QRES ); (* Cancel the result of q *) pop(ES ); (* Restore the previous state of ES *) end; pop(QRES ); (* Cancel the result of q *) push(QRES; RESULT ); 2
2
1
end else ...
end (*eval*); For each tuple r returned by q we combine the tuple with each tuple returned by q for this r. The result is a union of all such combinations. Note that in comparison to previous de nitions the change concerns only how the nal result is formed. 1
2
Examples EMP ./ WORKS IN The query returns a two-column table, where each row contains the identi er of a object EMP and the identi er of a object WORKS IN nested in the object EMP . EMP ./ (WORKS IN:DEPT ) 34
returns a two-column table what we have wanted at the beginning; the operator ./, however, is fairly general and covers many other interesting cases. Note that in the above example during binding of DEPT the environment stack will contain three elements, and DEPT object(s) are pointed from the rst and third section of the stack; the name DEPT is bound to a single object DEPT pointed from the third stack section.
EMP ./ (DEPT where EDNO = DNO) is another (relational) variant of the previous example. EMP ./ (WORKS IN:DEPT:(DNAME LOC )) returns a three column table, where identi ers of EMP are associated with identi ers of DNAME and LOC nested in proper DEPT . Note that the cartesian product acts on a 1 1 table and a single-column table of identi ers. Identi ers in the rst two columns may be repeated; this is caused by a repetition of LOC . DEPT ./ avg(EMPLOY S:EMP:SAL) returns a two-column table, where each row associates an identi er of DEPT with the number being the average salary in this department. In SQL this query requires the group by operator; our de nitons allow us to avoid it (as well as having predicates).
4.9 Quanti ers
The rst idea to deal with quanti ers is to consider them as aggregate functions: an existential quanti er is a generalized or operator, and the universal quanti er is a generalized and. For example, the query \Is it true that each employee earns more than 1500?" can be expressed as 8(EMP:(SAL > 1500)). The query EMP:(SAL > 1500)) returns a singlecolumn table of truth values, which can be processed by 8, 9, or another quanti er. This idea can be generalized as pump [BBKV87], i.e. a higher-level polymorphic operator, which is an encapsulated iteration taking a function and a base value as arguments. We follow here another idea which is syntactically more close to the traditional quanti er concept in the predicate calculus, and follows our de nitional style. The syntax is 8q(p) and 9q(p), where q returns a table and p returns a boolean value. Then the semantics for 8q(p) is de ned as: procedure eval( query: string);
begin ... if query is recognized as 8q(p) then begin var RESULT : Boolean; RESULT := TRUE ; eval(q); for each r 2 top(QRES ) do
begin
push(ES; nested(r)); (* Open a new scope on ES *) 35
eval(p); RESULT := RESULT ^ top(QRES ); pop(QRES ); (* Cancel the result of p *) pop(ES ); (* Restore the previous state of ES *) end; pop(QRES ); (* Cancel the result of q *) push(QRES; RESULT );
end else ...
end (*eval*);
The de nition for 9q(p) can be obtained from the above by changing two lines: the RESULT variable should be initialized to FALSE , and the line collecting the nal result should be RESULT := RESULT _ top(QRES ).
Examples Give departments where all programmers used to work for IBM: DEPT where 8 ((EMPLOY S:EMP ) where JOB = "programmer") ( 9 PREV JOB (COMPANY = "IBM ")) Consider the supplier-part database with the schema SUPP (SNO; SNAME; :::) PART (PNO; PNAME; :::) SP (SPSNO; SPPNO; :::). Give names of suppliers supplying all parts: (SUPP where 8 PART ( 9 SP (PNO = SPPNO and SNO = SPSNO))):SNAME We see here some limitations. Since quanti ers are not associated with (bounded) variables, many predicate calculus queries are impossible to express. Moreover, if the relation SP would be de ned as SP (SNO; PNO; :::), we would have the con ict between names of attributes. The next sub-section addresses this problem.
4.10 Bounded Variables, "Correlation\ Variables, Synonyms
In this sub-section we would like to investigate the concept of variable inherited by QLs from the predicate calculus. Although in mathematics the concept is semantically clear, this is not so in computer languages. If any name is introduced in a computer language, it raises the binding problem: what kind of data structures it implies, how these structures are manipulated, and how the name will be bound to them. A query language implemented in DBPL [ScMa92], based on the predicate calculus, presents an excellent example of this kind of problems. For the construct FOR EACH x IN EMP : x:JOB = "clerk" DO x:SAL := 3000; END; the calculus variable x becomes a mutable programming object, allowing to make updating, as shown above. The variable has an untypical \copy" semantics: it stores a copy of a tuple, 36
which at the end of a loop is ushed to the original relation. This semantics has consequences which are going far behind the meaning assumed in the relational calculus. Auxiliary names much increase the selective power. The possibility of naming structures to be processed (or parts of queries) supports also the level of abstraction and conceptual programming. Auxiliary names are associated not only with quanti ers. In SQL, if the relation is to be joined with itself, we must use \correlation variables" or \synonyms" because of the name con ict. The problem is more complex if we would like to apply such variables to any queries (not only to stored bulk data), we would like to make updating through them possible (as in DBPL), and simultaneously we would like to avoid the copy semantics (which leads to undesirable eects). The problem has many solutions, having various properties and consequences. One of them (the presented above DBPL case) is based on the observation that we already have variables (i.e. named storage objects), and they can be adopted as \calculus" variables. This approach was also experimentally implemented in a predecessor of LOQIS (with pointer semantics), then we abandoned it; in this paper we do not discuss its negative consequences. We invented and implemented another method, which seems to be free of disadvantages. We adopted the idea that an auxiliary name temporarily \overwrites" the original object name. The new name should be valid only in some context determined by a query; outside this context the auxiliary name should have no meaning. As we can expect, the above assumption leads to scoping and binding issues. Since a new name is locally associated with an object, the name should be the propery of the environment stack rather than the property of storage objects. So far, we did not assume that ES involves names, thus we have to change this assumption. Moreover, we must smoothly incorporate this novelty into the semantic de nitions that we presented so far. We have done some eort to rise the idea to reasonable generality, which allows us (as we will see in further examples) to achieve additional interesting eects. As usual, we distinguish two kinds of occurences of auxiliary names in a query: declaration and application. The syntax for the declaration is n 2 q, where n 2 N , q is an arbitrary query. The name n can be applied in a query after the declaration, and its scope is syntactically unlimited (the scope is a semantic property). The formal semantics is very simple. The construct n 2 q formally associates name n to each row r of the table produced by q; the new table contains elements which we will denote n(r) . Such a table is considered as single-column. To be consistent with previous de nitions we make improvements to the function nested and to the binding: nested(< row >): Atomic values and identi ers occuring in the < row > are treated as previous. For elements of the form n(e e ::: ek ) nested is the identity function. This means that such elements are copied to the environment stack without changes. The binding operator: it works according to previous principles and scoping rules, but binding name n to an element n(e e ::: ek ) on the environment stack returns to the result stack < e e ::: ek >; name n is not propagated to the result. 7
1
1
1
2
2
2
Semantics of 2 is essentially dierent from the classical, but we leave this symbol because of some tradition in QLs and associations concerning how to use it. 7
37
Such semantics of auxiliary variables has also consequences, which are going beyond the meaning assumed in the predicate calculus. Some untypical consequences of this semantics we will employ in examples of transitive closures and views. We illustrate this feature by examples.
Examples
C.f. MyDatabase. Consider the query
x 2 EMP As before, the atomic query EMP returns a single-column table with identi ers i ; i ; i . The operator 2 associates with each identi er the name x; in the result, QRES will contain the following table: x(i ) x(i ) x(i ) Consider the query (x 2 EMP ) where (x:SAL > 1800) The table presented above is processed by the operator where, followed by the predicate (x:SAL) > 1800. According to the semantic rule for where, for each element of the table we have to create a new scope. For the rst element x(i ) holds: nested(x(i )) = x(i ), hence the element without changes is writen at the top of ES . Thus the predicate is evaluated with the following environment stack: x(i ) i EMP ; i EMP ; i EMP ; i DEPT ; i DEPT Name x occuring in the predicate is bound to the top element of the stack. In eect, the element is written back to QRES , but this time the name x associated with this element is cut o. After the binding the state of QRES is the following: i x(i ) x(i ) x(i ) The top table of the stack, consisting of one element i , is now processed by the dot operator, followed by the atomic query SAL. According to the semantics of the dot operator, nested(i ) is pushed at the top of ES ; its state will be the following: i NAME ; i SAL ; i WORKS IN x(i ) i EMP ; i EMP ; i EMP ; i DEPT ; i DEPT 1
1 5 9
1
1
1
1(
)
5(
)
9(
)
13(
)
17(
)
1
1 5 9
1
1
2(
)
3(
)
4(
)
1
1(
)
5(
)
9(
)
13(
)
17(
)
38
1
5
9
Now the name SAL is bound as usual, returning i to QRES . Below we present states of QRES till the end of evaluation: 3
After eval(1800) and dereferencing: After comparison: 1800 After eval(x:SAL): TRUE i 2500 The nal result x(i ) x(i ) x(i ) of the whole query: x(i ) x(i ) x(i ) x(i ) x(i ) x(i ) x(i ) x(i ) x(i ) is accepted. 3
1
1
1
5
5
5
9
9
9
1
1 5
Note that in comparison to the previously analysed query EMP where SAL > 1800 the nal result is a little bit dierent: each element of the result table is equipped with name x. The reader can check that the query ((x 2 EMP ) where x:SAL > 1800):x removes x from the above nal result, thus it will be exactly the same as in the previous case. C.f. the running example. Give names and department names for employees earning more than their manager: (a) The relational model, the tuple-calculus (QUEL and SQL) style: (((e 2 EMP ) (m 2 EMP ) (d 2 DEPT )) where e:EDNO = d:DNO and d:MGR = m:ENO and d:SAL > m:SAL): (e:NAME d:DNAME ) (b) The relational model, the domain-calculus style: ((EMP:((ed 2 EDNO) (en 2 NAME ) (es 2 SAL)) EMP:((m 2 ENO) (ms 2 SAL)) DEPT:((d 2 DNO) (dn 2 DNAME ) (dm 2 MGR))) where ed = d and m = dm and es > ms):(en dn) Consider the supplier-part database. Give suppliers supplying all parts (the DBPL style): (s 2 SUPP ) where 8(p 2 PART )( 9(q 2 SP ) (s:SNO = q:SNO and q:PNO = p:PNO)) Note that in the last example variables p and q are \bound" i.e. they do not occur anywhere after the query is evaluated, but s is \unbound", i.e. it occurs in the result returned by the query. Consequences, in particular for the for each construct, will be illustrated in the next section. (End of examples.) Now a table on the result stack may contain not only elements of I [ V , but also elements of the form n(:::), where n 2 N . Because of the assumed orthogonality of operators, there is no reason to forbid cartesian products of such tables. Hence we can obtain rows that mix 39
elements of I , V and n(:::). Again, to such rows the operator 2 can be applied, what means, that the result stack can contain elements e.g. of the form n (i ; v ; n (:::)). Extending this way of thinking, we must allow as elements of structures stored at both stack arbitrarily nested labelled lists. In particular, the classical concept of a tuple < a : v ; :::; ak : vk > belonging to a relation named r can be represented as r(a (v ):::ak(vk )). In this unusual way we have came to the concept of complex value which can be manipulated directly on stacks. During the development of LOQIS we considered other extensions to semantic domains, in particular the case when an auxiliary name is assigned and bound to a whole table (not only to a row). This extension allows us to consider grouping and nested relations, and there are examples when such queries are reasonable. However, we also have to achieve some trade-o between the complexity of the language and its universality, because too complex semantics is not accepted by users. For this reason programming languages, as a rule, restrict the class of elements which can be manipulated on stacks; nevertheless, they are suciently universal because of other capabilities. 1
1
1
2
1
1
1
1
4.11 Transitive Closures
The transitive closure makes possible to process recursive data structures and to encapsulate some non-trivial iterations; thus much extends the power of QLs [AhUl79]. There are two approaches when de ning the operator. The rst one assumes that the relation to be closed is explicitly stored as a permanent or temporary table. Since the closure of it can be expensive, many papers are devoted to ecient algorithms. The second approach concerns how to make the operator computationally powerful. Examples of tasks requiring such a transitive closure concept are the following: (1) for a data structure describing parts-subparts, which contains information about quantities of subparts and weights of atomic parts, get the total weight of a given complex part; (2) calculate the least xed-point of some numerical equation x = f (x); (3) de ne the aggregate function sum; (4) calculate the shortest path in a graph; etc. This concept of the transitive closure assumes that the relation to be closed is implicitly de ned by some complex expressions; it may happen that the relation cannot be physically stored, since it is too large or in nite. Below we follow the second approach (which does not exclude the rst one, if the relation can be eciently stored in extenso). The transitive closure \explodes" a set of initial elements according to some relation r. An element b is inserted into the set if there is already an element a in the set, such that < a; b >2 r. Both initial elements and the relation can be determined by queries. Let q be a query determining the initial set. The elements collected in the rst step of the explosion can be determined by a query q :q . Query q navigates from the initial elements to their direct successors in the closure. Analogously, a query q :q :q determines elements collected in the second step of the explosion, q :q :q :q determines elements collected in the third step, and so on. We can therefore represent the transitive closure as an in nite union q [ q :q [ q :q :q [ q :q :q :q [ ::: This union is denoted by q closed by q . The semantics follows our standard method, through a modi cation of the de nition for the dot operator. We remind that q :q repeats 1
1
1
2
2
2
2
1
1
1
1
2
1
2
2
1
2
2
2
2
1
40
2
2
2
2
the evaluation of q for each row returned by q . The same does q closed by q , but the result of evaluation of q is not added to the temporary table, but to the table returned by q . In this way rows returned by q will be further processed as the original rows of q . The process terminates when for the last row of the processed table the table returned by q is empty. The de nition semantics is the following: 2
1
1
2
2
1
2
1
2
procedure eval( query: string); begin ... if query is recognized as q closed by q then begin var NEXTSTEP : Table; for each r 2 top(QRES ) do begin 1
2
push(ES; nested(r)); (* Open a new scope on ES *) eval(q ); NEXTSTEP := top(QRES ); (* Store the result of q *) pop(QRES ); (* Cancel the result of q *) top(QRES ) := top(QRES ) t NEXTSTEP ; (* Update the table returned by q *) pop(ES ); (* Restore the previous state of ES *) end; 2
2
2
1
end else ... end (*eval*); Examples
We take the example from [AtBu87]. The database schema is the following (or denotes exclusive variants):
repeating Part( Name(string) ( Base( Mass(real) :::) or Composite(repeating MadeFrom( Uses(" Part) Quantity(integer):::)))) The database contains information on base and composite parts. All parts have the attribute Name. A base parts has additionally the attribute Mass, and a composite part has the information about direct components: they are determined as a collection of pairs < p; q >, where p is a pointer to a component, and q is the quantity of the component in the part. 41
Get all parts recursively composing a part named \engine":
Part where (Name = "engine") closed by (Composite:MadeFrom:Uses:Part) Compute the total mass of the engine:
sum( (((Part where Name = "engine") (q 2 1)) closed by Composite:MadeFrom:(Uses:Part (q 2 (q ? Quantity)))): ((Composite:0 [ Base:Mass) ? q)) The sub-query from the 2-nd line returns a 2-column table consisting of one row, where the rst element is an identi er of the required part, and the second is the quantity of this part denoted q and equal 1. In the 3-rd line we navigate from the part to its sub-parts, counting simultaneously their quantities. This process is repeated for each sub-part, thus in the result we obtain a two-column table with identi ers of all parts participating in the \engine", together with their quantities. In the 4-th line each row of this table is projected to the multiplication of weight and quantity; the result is a single-column table of numbers. The aggregate function sum counts the total sum of these numbers.
Assume that query q returns a number. Give a query counting pq, according to the xedpoint equation x = (q=x + x)=2, starting from x = 1 and making 15 iterations. ((((x 2 1) (c 2 1)) closed by (((x 2 ((q=x + x)=2)) (c 2 (c + 1))) where c 15)) where c = 15): x
The auxiliary name c represents the counter of iterations. Constructs such as c 2 (c + 1) remind assignments from programming languages, but their semantics is dierent. The righthand c refers to an actually processed (existing) row, and the left-hand c participates in the construction of a new row. This variant of the transitive closure we implemented in LOQIS, thus the above examples and many others were carefully checked. After the implementation we realized that the operator is insucient for some tasks; thus we implemented other variants. The last example shows the case when only the last produced row is essential for the next step of the closure; intermediate rows can be removed \on-the- y", which results in shorter queries and better performance. Thus we introduced the syntactic variant q leaves by q . The least example can be formulated as follows: 1
2
(((x 2 1) (c 2 1)) leaves by (((x 2 ((q=x + x)=2)) (c 2 (c + 1))) where c 15)): x Next problem concerns duplicate rows. In the example with parts they should not 42
be removed, but in examples where the graph to be closed contains cycles they must be removed; otherwise the process would never terminate. Theoretically, we can apply the function distinct removing duplicates, but this leads to unsafe computations. For example distinct(1 closed by 2) should return a table with 1 and 2, but the evaluation will never terminate. Hence we introduced a special syntax for the case when duplicates should be removed \on-the- y".
4.12 Ordering
In the relational model ordering of data is not considered a conceptual issue. In contrast, object-oriented systems, (e.g. O [Deux+90]) and DBPLs (e.g. Galileo [ACO85]) deal with ordering (introducing lists or sequences). Ordering is an extremely important in real systems: benchmarks for typical data processing systems have shown that more than half of the performance time is devoted to sorting. In the proposal of DBTG CODASYL [CODA71] there was a possibility to keep simultaneously several orders of the same record type (via special \sets"), which may greatly support performance. Most QLs features are independent on data ordering, and for relational database theories ordering is an inconvenient feature. This is perhaps the reason for the popular belief that ordering is not a conceptual issue and is necessary only for forming the nal result. Ordering, however, is important either for the data modelling, performance, visualization of the output, and for querying. Many reasonable queries require ordering, e.g. \Give departments where all 50 best-paid employees are clerks". The query is easy to formulate if a QL would contain the ordering operator and an operator allowing to select rst n rows from a table. Taking and generalizing the ordering concept of SQL and QUEL, we introduce the operator order by by a modi cation of the operator ./. Assume syntax q order by q . Semantically, we make the join of q and q , sort the result according to columns produced by q , and then project onto columns produced by q . Let col nbr(q) denotes the number of columns of the table returned by the query q. The de nition of semantics of q order by q is the following: 2
1
1
2
2
2
1
1
procedure eval( query: string); begin ... if query is recognized as q order by q then begin 1
2
2
eval(q ./ q ); sort the top(QRES ) table according to col nbr(q ) last columns; remove from top(QRES ) col nbr(q ) last columns; 1
2
2
end else ...
2
end (*eval*); Examples EMP order by NAME 43
DEPT order by count(EMPLOY S ) Let the function first(n : integer; t : Table) : Table return n rst rows from the table t. \Get departments where all 50 best-paid employees are clerks" can be formulated as the following query:
DEPT where 8 (first(50; EMPLOY S:EMP order by (?SAL)))(JOB = "clerk") In [Ott92] similar capabilities are proposed for SQL. In LOQIS the order by operator is supported by another facility: a table returned by a query is equipped with a standard additional column storing elements of the form number(< row nbr >), where number is an auxiliary name, and < row nbr > is a successive row number, starting from 1. (The column is virtual, it is not physically stored.) This feature appears to be convenient for low-level operations on tables. In a combination with the sorting and transitive closures it seems to be more powerful than the pump operator of FAD [BBKV87] and the hom operator of Machiavelli [OBB89]; in particular, we can show that it can be used to de ne all popular aggregate functions.
Examples Give the median of salaries (cannot be expressed by pump or hom). ((EMP order by SAL) where number = entier(count(EMP )=2)):SAL In statistics we frequently need to remove extreme observations. Give the average salary, removing 5 lowest and 5 highest salaries (cannot be expressed by pump and hom): avg(((EMP order by SAL) where number > 5 ^ number count(EMP ) ? 5):SAL) Ordering implies a change in understanding of database instances and the output from queries. Previously we have assumed that sub-objects of a complex object form a set. Now we must assume that they may form a sequence, or a set of sequences. The also concerns results of queries stored at QRES . Since sequences are the most informative (i.e. they conceptually cover bags and sets) they could be choosen as a single structure from the above three. Such an idea simpli es the semantic problem, but it has disadvantages. Many collections of real objects behave as sets or bags; thus the system would not support this aspect of the conceptual modelling. Another disadvantage concerns query optimization: some methods do not work if we assume that the order of tuples in the result must be preserved.
4.13 Null Values and Variants
Null values and variants imply problems for QLs. The rst concerns the necessity of special care in queries. Consider the query EMP where SAL > 1800. If for some employee SAL is null-valued, then binding of name SAL returns the empty table, which has to be compared 44
by the comparison > with 1800. Both results of the comparison, TRUE and FALSE , are unacceptable. On the other side, the situation is quite normal, thus it should be possible to avoid a run-time error. Special approaches were proposed in connection to this problem, see for example [Codd79, Zani83]. In SQL such cases are handled by operators is [not ]null: EMP where is not null(SAL) and SAL > 1800 EMP where is null(SAL) or SAL > 1800 The rst query does not include employees with the null-valued salary in the output, and the second does. In both cases boolean operators and; or have a special semantics (corresponding to Ada operators \and then" and \or else") avoiding redundant evaluation, thus the comparison of the empty table with the numerical value will never occur. Since null values are coded by the write-nothing rule, we have already possibilities to avoid special operators acting on null-values or a special many-valued logic. Equivalents of the above two queries can be expressed by quanti ers: EMP where 9(s 2 SAL)(s > 1800) EMP where 8(s 2 SAL)(s > 1800) or by the function exists: EMP where exists(SAL) and SAL > 1800 EMP where not exists(SAL) or SAL > 1800 An example of treatment of variants is shown previously in the query \The total weight of the part". Thus the problem is not conceptually challenging. Note that our treatment of null values within aggregate functions arguments (null values do not in uence the result) is the same as in SQL. The useful SQL function ifnull(q; "when empty") can be expressed as q [ ( "when empty" where (not exists(q))) . The second problem connected with null values and variants is more serious and concerns bindings and scoping rules. Consider the query \Get employees earning the same salary as Brown does": EMP where SAL = ((EMP where NAME = "Brown"):SAL) What will happen if the Brown's salary is null-valued? Intuitively the query should cause a run-time error, but let follow the formal semantics. During binding of the second occurence of symbol SAL the environment stack will be the following: Identi ers of objects ENO, NAME , JOB ,... contained in the Brown's EMP object Identi ers of objects ENO, NAME , SAL, JOB ,... contained in the actually tested EMP object Identi ers of objects DEPT and EMP 45
Since the top does not contain a pointer to a SAL object, the search will be continued in lower sections of the stack. Unfortunately, the second section contains the pointer to SAL, thus the binding will be successful. In the result, the predicate after the rst where will be TRUE for any EMP having a SAL sub-object; obviously, it is a wrong result. The problem is caused by scoping rules. The search should be nished at the top of the stack, but the model contains no information what is the semantic quali cation of the second occurence of the name SAL in the query. This information should be explicitly given, and should in uence the scoping rules. The information can be introduced by types. Names of subobjects of an object (possibly actually not present in the object) can be deduced from the type. Hence some elements of types and the static binding cannot be avoided in the presented approach. This modi cation of the binding mechanism has also a performance advantage. For each name occuring in a query we can statically determine the section of the environment stack where the binding has to be done; thus the dynamic search down the stack can be avoided. In the following we employ this property for query optimization. The same considerations can be repeated for auxiliary variables.
5 Procedural Constructs Many PLs constructs can be adopted to extend the power of a QL: data creation, assignments, insertions, deletions, control commands (if...then...else, for, while, repeat, case, etc.). Variants of them can be found in DBPL and LOQIS. Here we present two constructs which are important for the procedural many-data-at-a-time processing, namely, assignments (updating) and for each.
5.1 Assignments
The semantics of assignments l := r in programming languages assume that the left-hand side l is evaluated to an identi er (l-value), and the right-hand side r is evaluated to a value (r-value); then the operator assigns the value to the object with this identi er. The problem is slightly more complicated if the language deals with pointers and complex values. This syntax can be extended to QLs. We can discuss three methods for such an extension: The APL method: l and r are vectors of equal size, and i-th value of r is assigned to the object identi ed by the i-th identi er of l. The DBPL method: l and r are sets of identi ers and values, respectively; all data pointed by l are deleted and then a new collection of data is created with values determined by r. The SQL method: rst, a query returns tuples determining a context for updating. Then, l and r concern updating of attributes for each tuple inside this context. The APL and DBPL methods have disadvantages. The APL method relies on data order, which in many cases is irrelevant. Both APL and DBPL cause syntactic redundancy; 46
for example, the assignment \Rise by 100 the salary of all suppliers working in the Toy department" must be coded as ((DEPT where DNAME = "Toy"):EMPLOY S:EMP where JOB = "supplier"):SAL := ((DEPT where DNAME = "Toy"):EMPLOY S:EMP where JOB = "supplier"):(SAL + 100) For the APL method we must sometimes predict the size of tables, what is harmfull. For example, \Give salary 5000 for all clerks" cannot be formulated as (EMP where JOB = "clerk"):SAL := (5000 t 5000 t 5000 t :::) Similarly, if the assignment concerns dierent tables or null-valued data, it may be hard to assure the equal size of tables. The DBPL method makes diculties with the objectorientation, assuming invariant identities for objects during their life. Below we generalize the update statement of SQL, which avoids these disadvantages. Assume the syntax := q, where q is a query returning a two-column table; the rst column stores identi ers (it corresponds to the l ? value) and the second column stores values (it corresponds to the r ? value). In consequence, two last examples can be written as := ((DEPT where (DNAME = "Toy")):EMPLOY S:EMP where (JOB = "supplier")): (SAL (SAL + 100)) and := (EMP where (JOB = "clerk")):(SAL 5000) To incerease readability, in the typical 1:1 case we will also use the traditional syntax. The assignment can concern complex objects. The naturally extended semantics assumes deleting sub-objects of a object pointed by the left-hand side, and then copying \into" it subobjects determined by the right-hand side. There is the necessity to distinguish syntactically the assignments of complex values and the assignment of pointers. This distinction is usually determined by types. Since so far our semantic framework is untyped, we use ad hoc syntax: := pointer means assignment of a pointer. For example, := pointer Y (EMP where NAME = "Smith") means that the identi er of the Smith's object is assigned as a value of the object Y .
5.2
For each statements
Statements for each can be introduced into the language, with the syntax for each q do s where s is a statement or a sequence of statements, enclosed in the parentheses begin and end. The semantics will follow our de nitional pattern, in which s is iteratively executed in 47
the environments determined by tuples returned by q. Similarly to eval, execute(s : string) is a recursive procedure with side eects on the state (a database instance, ES , and QRES ), which determines the semantics of s (i.e. it \executes" s). The semantics of the construct is the following: procedure execute( statement: string);
begin
... if statement is recognized as for each q do s then
begin eval(q); for each r 2 top(QRES ) do begin
push(ES; nested(r)); (* Open a new scope on ES *) execute(s); pop(ES ); (* Restore the previous state of ES *) end; pop(QRES ); (* Cancel the result of q *)
end else ... end (*execute*);
We show on examples that this de nition is quite powerful and free of disadvantages, such a the copy semantics, non-orthogonality and limited capabilities.
Examples for each EMP where JOB = "clerk" do begin SAL := SAL + 100; newline; print(NAME ); end for each (x 2 (2 t 3 t 5 t 7)) ./ (y 2 sqrt(x)) do begin print(x y); newline; end for each DEPT ./ (MANAGER:EMP ) do begin print( "Department :"DNAME "Manager :"NAME ); newline; end for each (s 2 SUPP ) where 8(p 2 PART )( 9(q 2 SP ) (s:SNO = q:SNO and q:PNO = p:PNO)) do s:SAL := s:SAL + 100; The last example resembles DBPL, but the semantics of s is essentially dierent: a copy of a SUPP object is not made. This makes possible to avoid other anomalies.
48
5.3 Procedures
In the database domain views and database procedures have a special conceptual and pragmatic meaning: views are understood as virtual data derived from stored data (usually determined by a query), and database procedures are understood as stored procedures written in a QL. There are many other terms denoting similar concepts, for example, virtual (derived) attributes, methods, rules, selectors, constructors, etc. Essentially all these concepts can be considered particular cases of a well-known concept of procedure. Unfortunately, this analogy is not well recognized in the database domain. The approach to database procedures and views from this side can be very fruitfull, since programming procedures have well-established auxiliary concepts, theory, and the implementation state-of-the-art, see e.g. [WaGo84]. Procedures, especially in the context of complex data, pointer-valued data, and declarative QLs present variety of ideas. Before xing our proposal, we discuss some possibilities.
Semantics of procedure calls. Historically, the earliest semantics of procedure calls was
based on the textual substitution: when the program control has reached the procedure call, it is textually substituted by the procedure body, with simultaneous substitution of formal parameters by actual parameters. (This is retained in some languages, e.g. C, in macros.) Because of obvious disadvantages this semantics has been abandoned. We mention it only for one reason: processing and optimization of queries involving views (so called query modi cation or rewriting techniques) are essentially based on this technique. The most popular semantics is stack based. It means that local environments and actual parameters are stacked at the environment stack, what allows to introduce recursive procedures and - through scoping rules and static binding - supports locality of identi ers and good performance. Recently, in connection with logic programming, another kind of semantics is considered, called xed-point semantics. A procedure name p denotes a set of mathematical objects satisfying a system of xed-point equations and constraints. The xed-point semantics can be considered for any expression-oriented language, in particular, to a QL from the class we are dealing with . The idea is accomplished in constructors of DBPL [ScMa92, ERMS91] and in other languages integrating procedural and declarative programming [Mant91, HFLP89]. Consistent integration of this idea with locality of objects, with procedural constructs (for updating, programming interactive scenarios, etc.), and with ne programming abstractions may result in a higher programming comfort, power and conceptual modelling support. So far, the idea leads to problems, e.g. how to assure good performance, safe computations, and pragmatic universality. We follow the classical stack-based semantics. For a particular case when there is no local environments, this semantics is equivalent to the textual substitution. 8
Local environments and scoping rules. A procedure can introduce local objects, which are removed when the the procedure is terminated. As shown in Fig.3 this can be consistenty
Ullman claims that this kind of semantics can be consistently introduced only for logic-based (valueoriented) languages, see e.g. [Ullm91]; the claim is the subject of some critics in the database literature. 8
49
implemented by storing the objects in the independent pool and stacking at ES only their identi ers. The stacking supports nested procedure calls and recursion.
Parameter transmission. There are several semantically dierent methods of dealing
with parameters. We mention call-by-name, call-by-value, call-by-reference, and call-byneed. The call-by-name is an old method discussed in the context of Algol-60, is essentially a technique of macro-substitution, where formal parameters inside a procedure body are textually substituted by the actual parameters. As in the case of semantics of procedure calls, we mention this method only because of query optimization based on rewriting. In the Pascal family two techniques are applied: call-by-value, where values of parameters are evaluated before passed to the procedure body, and call-by-reference, where the parameter after the evaluation is a reference. The call-by-need method (called sometimes lazy evaluation) is a technique which postpones evaluation of a parameter to the moment when it is necessary in the body, preventing sometimes from unnecessary computations. An interesting approach to parameter passing is implemented in INGRES/Windows 4GL [Ingr90]. We call it call ? by ? union. Parameters are local objects of a procedure, visible in call statements. Thus parameter passing is a normal assignment to these objects. The call statement has therefore an untypical scope: it is the union of the calling environment and some objects from the local environment of the called procedure. In the context of QLs this approach has many advantages. In particular makes clean situation when parameters are complex structures, does not require new syntax and semantic models for the passing of parameters makes programs to be more readable, supports conceptual modelling (since names of \formal parameters" are written together with their values), allows to avoid ordering of parameters, and allows to avoid writing actual parameters which values are inessential for a particular call (it deals with variable number of actual parameters). The method introduces a change to typing systems, which are traditionally based on the functional view on the procedure concept. In LOQIS we use queries as parameters of procedures; as in Algol-68 we assumed the strict call-by-value technique, where references are values. To avoid syntactic distinction of \mutable" and \immutable" parameter's elements we follow the idea of lazy application of the dereferencing operator. This makes updating through parameters possible. Parameters are treated as local constants. They are evaluated on QRES, then shipped to a special stack, dierent from ES and QRES . The assumption that columns of tables returned by queries are unnamed is sometimes inconvenient; we provided a special construct which makes possible to name the columns, as does the operator 2 described above.
Side eects of functional procedures. In all popular programming languages the output
from a procedure can depend upon values of global objects, and functional procedures can update global objects. Side eects of functional procedures called in queries are considered to be dangerous and lead to problems with query optimization (which may change the execution order). Thus we see the need to introduce the attribute \side-eects-free" to the typing system. Otherwise two extreme approaches are possible: to forbid procedure calls inside queries (as in DBPL), which is inconsistent with the orthogonality principle and much cuts the power, or to leave everything in hands of the programmer, as in almost all programming 50
languages.
Output from functional procedures. Because we would like to combine procedure calls
with queries, we assume that the output from functional procedures belongs to the same semantic domain as for queries. As for parameters, we assume that the dereferencing operator is lazy. This causes a problem for local objects: returning pointers to them should be considered an error, since these objects are canceled when the procedure is terminated. In programming languages the problem is solved by typing; here we apply the explicit dereference.
Views as functional procedures, view updating. Some authors observed that views
can be recursive and can have parameters [Toya86]. It can be shown on examples that functional procedures with a complex output are conceptually equivalent to views, in the SQL sense. All properties of views discussed in [Daya89] can be explained through well-known programming concepts. Since the output can contain pointers, it is possible to update the database through a view. DBPL selectors [ERMS91] are examples of such updatable views. There is, however, some undesirable homonymy in the terminology. A view, as understood in the conceptual modelling, is a virtual data structure; this structure can be determined | in particular | by a view, understood as a functional procedure. Updating through a view (a functional procedure) is not equivalent to update of a view (a virtual data structure). Mapping of updating of this structure into updating of stored data is ambiguous, sometimes impossible, and depends upon data semantics and the user intention. Ideally, the view updating should be transparent for the programmer: there should be no syntactic and pragmatic dierence in updating stored data and views. The problem of transparent view updates received a lot of attention in the relational model, and this is a promising research direction in our approach. We use the following syntax: Declaration of a procedure: procedure ( ) begin end ;
The list of formal parameters may be empty. Procedure call (we associate names of formal parameters with actual parameters):
(p : q ; p : q ; ::: ) where pi is a name of i-th formal parameter, qi is a query associated with this parameter. For procedures without parameters we use syntax ( ) or simply . 1
1
2
2
51
Statement return: we assume two forms: return and return . Local objects: we introduce a statement for declaration/creation a object with syntax create local < speci cation of the object >
Scoping rules: binding of name n The stack of parameters The environment stack
Actual parameters of p1
Identifiers of objects ENO, NAME, SAL,... nested in the actually tested EMP object
Actual parameters of p2 ......
Identifiers of local objects of p1
Invisible sections
Invisible sections
{ {
Identifiers of local objects of p2 ...... Identifiers of global objects of the module of p1 Identifiers of global objects of the module of p2 ...... Identifiers of global objects of the main module
Figure 6: Binding of a name in LOQIS In this paper we do not consider further details of the syntax and semantics of procedures. In Figure 6 we present the environment stack of LOQIS during binding of name n occurring in a query EMP where f (n), which is inside the body of a procedure p , which is called from a procedure p . The gure does not present modi cations of the scope rules by viewers and external interfaces of modules. 1
2
Examples (all of them can be writen in the LOQIS syntax) A functional procedure `poor' has a list of jobs as a parameter. It returns pointers to names, salaries, and department names of employees, who do one of the speci ed jobs and earn less than the average. procedure poor( JOBS ) begin create local AVERAGE( avg( EMP.SAL ) ); 52
(* Creating 0 or more pointer oobjects POOR pointing proper EMPs *) create local POOR( pointer to EMP where JOB in JOBS and SAL < AVERAGE ); return POOR.EMP. ( (N 2 NAME) (S 2 SAL) (D 2 WORKS IN .DEPT.DNAME)); end poor; Give names of poor clerks and programmers from the department 'Sales'. (poor( JOBS: "clerk" t "programmer" ) where (D = "Sales")).N Increase salaries of poor programmers from the department 'Sales' by 100: := (poor( JOBS: "programmer" ) where (D = "Sales")).(S (S+100)) A procedure `ChangeDept' has parameters EMPS, storing pointers to employees, and DEP storing a pointer to a department. It causes moving the speci ed employees to the speci ed department. procedure ChangeDept( EMPS; DEP ) begin for each e 2 EMPS do begin delete e.WORKS IN.DEPT.EMPLOYS where EMP = e; create and insert the object EMPLOYS( pointer to e) into DEP; := pointer (e.WORKS IN) DEP; e.EDNO := DEP.DNAME end end ChangeDept;
Let Brown take all designers working for Smith: ChangeDept( EMPS: EMP where JOB = "designer" and WORKS IN.DEPT.MANAGER.EMP.NAME = "Smith"; DEP: DEPT where (MANAGER.EMP.NAME = "Brown" ))
De ne a view MyV iew(Dname; AvgSal; Mgr(Name; Sal)) containing information about department names, average salaries, and manager names and salaries for departments located in Paris. procedure MyView( ) begin return (DEPT where "Paris" in LOC).( (Dname 2 DNAME) (AvgSal 2 avg(EMPLOYS.EMP.SAL)) (Mgr 2 (MANAGER.EMP.( (Name 2 NAME) (Sal 2 SAL))))) end MyView; 53
Give manager name for the department with the highest average salary: (MyV iew where AvgSal = max(MyV iew:AvgSal)):Mgr:Name Increase by 200 the salary of the manager of the Sales department: := (MyV iew where Dname = "Sales"):Mgr:(Sal (Sal + 200)) In this example updating of the view implies no ambiguities and side eects. See [AtBu87]. The database description is the following: repeating Part( Name(string) ( Base( Cost(real) Mass(real) :::) or Composite( AssemblyCost(real) MassIncrement(real) repeating MadeFrom( Uses(" Part) Quantity(integer) :::)))) De ne a recursive view costAndMass( name, cost, mass ) with the parameter PARTS being a single-column table of identi ers of objects Part; for the parts it returns name, the total cost, and the total mass. procedure costAndMass( PARTS ) begin return PARTS.( (name 2 Name) (cost 2 (Base.Cost [ Composite.(AssemblyCost + sum(MadeFrom.(Quantity * costAndMass(PARTS: Uses.Part).cost))))) (mass 2(Base.Mass [ Composite.(MassIncrement + sum(MadeFrom.(Quantity * costAndMass(PARTS: Uses.Part).mass)))))) end costAndMass; This procedure can be optimized in the style shown in [AtBu87].
5.4 Object Orientation
Some features of object-oriented databases are already assumed in our framework, in particular object identi ers, complex objects, object sharing, path expressions, complex values, and integration of QLs with PLs. The idea of object-orientation includes other concepts that are relevant to QLs, namely, classes, inheritance, methods, and encapsulation.
Classes. There are many de nitions of the class concept and relationships beetween classes
and types. We do not consider subtles of these de nitons and their dependency on a concrete system or theory. For our idea the following view is convenient: a class is an object storing 54
invariants for other objects (being elements of the class). These invariants are also objects; they may include methods/procedures, default attributes [WKS89], common values of attributes, constraints, typing information, etc. Invariants are inherited by elements of the class and elements of its subclasses. Two kinds of the inheritance can be considered: static and dynamic. In the static case (e.g. C++) classes are not the rst-class citizens; hence the inheritance of invariants works during the compilation time only. For systems with late binding we can also consider also dynamic inheritance of the invariants. A promising concept for the dynamic inheritance we called viewers [SMSRW93]; it is implemented in LOQIS. Classes imply the following consequences for QLs: Structural inheritance. Objects from sub-classes inherit structural properties of their super-classes. For instance, an object from the class STUDENT is considered to be in the class PERSON; hence queries addressing objects of the class STUDENT contain names of attributes that are de ned for the class PERSON. Moreover, queries addressing objects of the class PERSON need to take into account the STUDENT objects. This property leads to semantic ambiguity; in POSTGRES [SRH90] such queries require an explicit syntactic distinction. Semantics of the structural inheritance can be easily incorporated into the proposed mechanism. A concrete solution depends on the citizenship status of the class concept. When classes are not rst-class objects, we can store the graph of relationships between classes; then we can apply rewriting rules to some queries before executing them. The rules will substitute names refering to some class C by the union of names of all subclasses of C; for example, PERSON will be substituted by (PERSON [ STUDENT [ EMPLOY EE [ :::). Then, names occuring in a query can be bound as usual. When classes are rst-class objects, the solution depends on the storage model assumed for objects and classes. For example, we can extend our storage model assuming that each object can have many names; the object with names n ; n ; ::: is bound to the name n occuring in a query, if some ni = n. Then, we assume that each object has the name of its class, and the names of all its superclasses as synonyms. For example, an object STUDENT has both names STUDENT and PERSON . Also in this case we can apply the standard binding mechanism. Note that the storage model with synonimic object names is more powerful than typical object-oriented models; in particular, it allows us to model dynamically changed object roles [RiSc91]. During the implementation of LOQIS we also considered another storage model, with special pointers connecting a class with all its members (incuding members of its subclasses); in this case the binding rules must be changed. Behavioral inheritance or inheritance of invariants. Classes contain invariants | methods, procedures, default values, common values, etc. | which are inherited by elements of the class and by elements of its sub-classes. In our framework this means that objects are connected not only to \own" sub-objects (i.e. attributes), but also to objects stored inside their classes. This can be easily taken into account in the de nition of a QL by a change of the function nested, introduced previously, and a change of the scope rules to resolve possible name con icts (i.e. to do 1
55
2
\overridding"). Such modi ed mechanism is implemented in LOQIS. For example, (EMP where (NAME = "Smith")):fire() denotes an application of a method fire, which is an object stored inside the EMP class. Assume that nested is changed into nested, where nested(r) = nested(r) [ nested(class(r)) [ nested(super(class(r))) [ nested(super(super(class(r)))) [ :::. Functions class and super return identi ers of the class and the superclass(es) of the object(s) identi ed by r, respectively. Such a change of the function nested allows us to consider fire as an attribute of the object, thus we can apply the usual binding rules. Methods. Methods are procedures stored within classes; \message passing" is a terminological and syntactic variant of a procedure call. An example of a call of the method fire is shown above. Queries can be applied as parameters of a method, and within its body. A method, as a functional procedure, can be called within a query, and a query can be used to determine the context in which the method is to be applied. For example, consider the construct (EMP where (JOB = "analyst")):fire(), with the semantics \Fire all analysts". The procedure fire is applied many times and it has an implicit parameter: the current analyst. To deal with imlicit parameters object-oriented languages introduce the symbol self . Semantics of it is very simple in our framework: the symbol self is bound to the row from the top of QRES that is actually processed by an operator where, dot, ./, etc.
Encapsulation. An orthodox approach to encapsulation [ABDD+89] assumes that ob-
jects can be accessed and processed only by methods, which leave a little room for QLs. There are examples (see e.g. [Daya89]) showing that this orthodoxy is contrary to the common sense, and perhaps all object-oriented database approaches do not follow it, see e.g. [Beer89, ClDe92, Cruz89, Daya89, Kim89, KKS92]. We can also argue that in the database domain the concept of a schema has a well-founded tradition as a tool for representing data semantics for the applicational programmer. A schema describes mainly static data; their behavioral aspects (methods) are add-ons only. Hence, data should be visible for users and programmers on the proper level of data independence. Encapsulation is also a principle of programmming languages such as Modula-2 or DBPL, where access to modules' properties (in particular, objects, procedures and types) is restricted by export/import lists and the speci cation part of the module is separated from the implementation part. In this way inessential properties of data and procedures can be hidden or shifted to the lower implementation level. Classes, besides the above property, introduce a structure supporting the conceptual modelling and inheritance (multi-inheritance). We can therefore consider a class as a concept sharing properties of modules and supporting inheritance. A class, as a module, exports some properties, in particular, methods and some \projection" of objects (the projection hides internal object properties, e.g. attributes). On the other hand, classes are connected into the structure of inheritance relationships, where methods (in general, invariants of objects) are inherited by sub-classes. With this understanding we see no contradiction between QLs and encapsulation.
56
6 Query Optimization In the paper we do not discuss many methods and aspects of query optimization which are relevant for the class of QLs that we have proposed. In particular, we can present many rules for equivalent transformation of queries. It is also obvious that the typical optimizations known from the relational model (selections through indices, selections before joins, ecient join methods) are applicable for some queries of a QL developed according to the stack-based framework. In general, current optimization methods are very sensitive to particular query patterns: even small violation of a pattern causes the optimization method to be inapplicable. This problem is addressed in Exodus [GrDe87] and Postgres [SRH90] where parameterized query optimizers are developed. Parameterization allows to adapt the optimizer to new query patterns and data organizations. Below we describe an easy to implement method, which is to a big extent independent of query patterns. We explain it on examples, then present the general rules. The method is based on observations concerning bindings. Consider the query (e 2 EMP ) where e:SAL = (EMP where NAME = "Smith"):SAL It is non-optimal since the nested subquery (EMP where NAME = "Smith"):SAL will be evaluated as many times as the number of employees. The nested query should be evaluated in advance (or in the rst loop performed by the outer query). In [Kim82] this kind of queries is called type-A or type-N. Consider another query (e 2 EMP ) where e:SAL = max((EMP where DNO = e:DNO):SAL) This time the inner query cannot be executed in advance because it depends upon the outer object e. This kind of queries is called in [Kim82] type-JA or type-J. We need criteria to recognize that the inner query is independent from the outer part. As we argued in the section devoted to variants and null values, each name occuring in a query, which has to be bound, should be statically relativized to the environment stack. That is, during static analysis of the query each name can be associated with two relative numbers: the size of the stack when the name is bound, and the level of the stack where corresponding pointers have to be found. We used the term relative since (as usual in languages supporting procedure nesting and recursion) stack sections are relativized according to scoping rules: we are interested only in sections that are visible through scoping rules and in each program point may abstract from invisible sections. Thus, without loss of generality we can assume that the execution of a query (also queries nested in procedures) starts with the environment stack having only one section. With this assumption, relative numbers (StackSize; BindingLevel) for names occuring in the rst query are the following (e 2 EMP ) where e :SAL = (EMP where NAME =00 Smith00):SAL (1; 1) (2; 2) (3; 3) (2; 1) (3; 3) (3; 3) and for the second query
57
(e 2 EMP ) where e :SAL = max((EMP where DNO = e :DNO ):SAL) (1; 1) (2; 2) (3; 3) (2; 1) (3; 3) (3; 2) (4; 4) (3; 3) The reason why the subquery in the second query is not independent from the outer query is the following: name e is bound on the 2-nd level of the stack, but just this level is permanently changed by the loop implied by the operator where of the outer query. In the rst query no name occurring in the inner subquery is bound on the 2-nd level, hence it is independent. These examples illustrate the principle, which can be easily generalized. We observe that during making of the above inference: (1) We are not interested in how complex the part of the outer query is before and after where; (2) We are not interested in how complex the inner query is and how it is constructed: only formal properties of bindings are essential; (3) We are not interested which operators connect the inner query with the outer query; (4) With no change we can repeat the inference if instead of where we will consider projection, join, quanti ers, transitive closures, and order by. We formulate the method in more general terms. Assume syntax q f (q ), where q , is a query, is an operator where, dot, ./, 9, 8 (for uniformicity, we consider quanti ers as in x operators), closed by, order by, etc.; semantics of these operators is based on iterative opening a new scope on the environment stack. q is an inner query, f represents a syntactic construct involving q . Because we make no assumptions concerning f , the inner subquery q may participate in several loops implied by the above operators. Let < n ; n ; :::; nk > denote the sequence of names occuring in q , and let < (s ; b ); (s ; b ); :::; (sk; bk ) > denote the corresponding sequence of pairs of numbers associated with these names; si denote the stack size when ni is bound, and bi denote the stack level where ni is bound. Let StackSizeq denote the size of the stack when the query q is started to be evaluated. Usually it holds StackSizeq = min ik si (but in general q may not contain names). The inner query q is totally independent from the outer query and may be evaluated in advance (or, lazily, once when needed) if holds: - it does not contain data names, or - for each bi holds bi = 1 or bi > StackSizeq . That is, the inner query is totally independent form the outer query if binding of names occuring in the inner query is not accomplished on such sections of ES that can be changed by the operator and by similar operators occuring in f . Consider another example (\For each department give the number of employees earning more than their manager"). 1
2
1
2
2
2
1
2
1
1
2
2
2
2
2
2
1
2
2
2
DEPT ./ count(EMPLOY S:EMP where (SAL > MANAGER:EMP:SAL)) Intuitively, the nested subquery MANAGER:EMP:SAL is independent on the where loop processing EMP objects, but it is dependent upon the external ./ loop processing DEPT objects. Thus the subquery is partly independent: it can be evaluated once in external loop and need not to be evaluated inside an internal loop. As before, partial independence can be expressed in general terms. Evaluation of q can be shifted outside j internal nested loops if for each bi holds: 2
58
bi (StackSizeq ? j ) or bi > StackSizeq The partial independence of queries generalizes the well-known rule \do selections before joins". Indeed, consider the query DEPT ./ (EMPLOY S:EMP where "Paris" in LOC and JOB = "clerk") The inner subquery "Paris" in LOC is partially independent and can be switched out of the internal where loop. Thus it will participate only in a loop concerning DEPT objects. Taking into account additivity of the ./ operator with respect to its rst argument, we can transform the query to a more optimal form (DEPT where ("Paris" in LOC )) ./ (EMPLOY S:EMP where JOB = "clerk") We can also show that the decomposition method, concerning conjunctive queries [WoYo76], can be generalized in our framework through the concept of the partial independence of subqueries. 2
2
7 Conclusion In this paper we have presented an approach to query languages based on a modi cation of the concepts known in programming languages. We believe, the approach makes possible to achieve a proper level of the pragmatic universality and precision of speci cation of semantics. Although our pesentation is semi-formal, it is easy to see that the approach makes possible to build powerful mathematical models. We have shown that classical stack-based mechanism of programming languages can be modi ed in order to process the declarative queries. Various query language concepts and operators are formally de ned by simple stack operations, without refering to theoretical frameworks such as relational algebra, calculus or logic. This allows us to avoid some limitations implied by these frameworks. The proposed approach allows to build powerful query languages to a variety of data models, in particular, for relational and object-oriented models. Following a strong de nitional discipline as-fewconcepts-as-possible, we received a level of uniformicity which creates a new potential for query optimization. The presented approach has been implemented in the system LOQIS, and this experience is quite encouraging.
References [AhUl79] [ACO85]
A.V. Aho, J.D. Ullman. Universality of Data Retrieval Languages. Proc. of 6-th ACM Symposium on Principles of Programming Languages, San Antonio, TX., Jan. 1979, ACM NewYork, 110-117 A. Albano, L. Cardelli, R. Orsini. Galileo: A Strongly-Typed, Interactive Conceptual Language. ACM Transactions on Database Systems, Vol.10, No 2, 1985, pp.230-260 59
[ABDD+89] M. Atkinson, F. Bancilhon, D. DeWitt, K. Dittrich, D. Maier, and S. Zdonik. The Object-Oriented Database System Manifesto. Proc. 1-st DOOD Conf., Kyoto, pp.40-57, 1989. [AtBu87] M.P. Atkinson, O.P. Buneman. Types and Persistence in Database Programming Languages. ACM Computing Surveys, Vol.19, No.2, pp.105-190, 1987 [BBKV87] F. Bancilhon, T. Briggs, S. Khosha an, and P. Valduriez. FAD, a Powerful and Simple Database Language. Proc. 13th VLDB Conf., Brighton, pp.97-105, 1987 [Beer89] C. Beeri. Formal Models for Object-Oriented Databases. Proc. 1-st DOOD Conf., Kyoto, pp.370-395, 1989. [Card89] L. Cardelli. Typeful Programming. DIGITAL Systems Research Center, Palo Alto, Report No 45, May 1989 [CDV88] M.J. Carey, D.J.DeWitt, S.L. Vandenberg. A Data Model and Query Language for EXODUS. Proc. ACM SIGMOD Annual Conf., pp.413-423, 1988. [ClDe92] S. Cluet, C. Delobel. A General Framework for the Optimization of ObjectOriented Queries. Proc. ACM SIGMOD Conf., pp.383-392, 1992. [CODA71] CODASYL Database Task Group Report, ACM, New York, 1971 [Codd79] E.F. Codd. Extending Database Relations to Capture More Meaning. ACM Transactions on Database Systems, Vol.4, No 4, 1979, pp.397-434 [Cruz89] I.F. Cruz. Declarative Query Languages for Object-Oriented Databases. Oce and Data Base Systems Research '89 (Ed. F.H. Lochowsky) Technincal Report CSRI-238, Computer Systems Research Institute, University of Toronto, pp.92130, June 1990. [CMW87] I.F. Cruz, A.O. Mendelzon, P.T. Wood. A Graphical Query Language Supporting Recursion. Proc. ACM SIGMOD Conf., pp.323-330, 1987. [Daya89] U. Dayal. Queries and Views in an Object-Oriented Data Model. Proc. of 2-nd DBPL Workshop, Gleneden Beach, Oregon, pp.80-102, 1989 [Deux+90] O. Deux et al. The Story of O . IEEE Transactions on Knowledge and Data Engineering, 2:1, pp.91-108, 1990. [ERMS91] J. Eder, A. Rudlo, F. Matthes, J.W. Schmidt. Data Construction with Recursive Set Expressions. Next Generation Information System Technology. Proc. of 1st East/West Database Workshop, Kiev, USSR, Oct.1990, Springer Lecture Notes in Computer Science 504, 1991, pp.271-293 [GrDe87] G. Graefe, D.J. DeWitt. The EXODUS Optimizer Generator. Proc. of ACM SIGMOD 87 Conf., 1987, pp.160-172 2
60
[HFLP89]
L.M. Haas, J.C. Freytag, G.M. Lohman, H. Pirahesh. Extensible Query Processing in Starburst. Proc. ACM SIGMOD Conf. pp.377-388, 1989. [Ingr89] Using INGRES Through Forms and Menus for the UNIX and VMS Operating Systems. INGRES Release 6, Relational Technology, June 1989. [Ingr90] Language Reference Manual for INGRES/Windows 4GL for the UNIX and VMS Operating Systems. INGRES Release 6, Ingres Corporation, August 1990. [Kim82] W. Kim. On Optimizing an SQL-like Nested Query. ACM Transactions on Database Systems, Vol.7, No 3, 1982, pp.443-469 [Kim89] W. Kim. A Model of Queries for Object-Oriented Databases. Proc. of 15-th VLDB Conf., Amsterdam, The Netherlands, pp.423-432, 1989 [KKS92] M. Kifer, W. Kim, Y. Sagiv. Querying Object-Oriented Databases. Proc. ACM SIGMOD Conf. pp.393-402, 1992. [KGBW90] W.Kim, J.F.Garza, N.Ballou, D.Woelk. Architecture of the ORION NextGeneration Database System. IEEE Transactions on Knowledge and Data Enginering, Vol.2, No.1, 1990, pp.109-124 [Mant91] R. Manthey. Declarative Languages - Paradigm of the Past or Challenge of the Future? Proc.1st Intl. East/West Database Workshop on Next Generation Information System Technology, Kiew, USSR 1990 Springer Lecture Notes in Computer Science, Vol.504, pp.1-16, 1991. [MRSS92a] F. Matthes, A. Rudlo, J.W. Schmidt, K. Subieta. The Database Programming Language DBPL, User and System Manual. FIDE, ESPRIT BRA Project 3070, Technical Report Series, FIDE/92/47, 1992 [MBCD89] R. Morrison, F. Brown, R. Connor, A. Dearle. The Napier88 Reference Manual. Universities of St Andrews and Glasgow, Departments of Comp. Science, Persistent Programming Report 77, July 1989. [MBW80] J. Mylopoulos, P.A. Bernstein, H.K.T. Wong. A Language Facility for Designing Database-Intensive Applications. ACM Transactions on Database Systems, Vol.5, No 2, 1980, pp.185-207 [O2Ma92] The O User Manual, Version 4.1. O Technology, Versailles, France, October 1992 [OBB89] A. Ohori, P. Buneman, V. Breazu-Tannen. Database Programming in Machiavelli - a Polymorphic Language with Static Type Inference. Proc. of ACM SIGMOD 89 Conf., 1989, pp.46-57 [Orac91] PL/SQL, User Guide and Reference, Version 1.0, June 1991. Oracle Corporation 1991. 2
2
61
[Ott92]
N. Ott. Aspects of the Automatic Generation of SQL Statements in a Natural Language Query Interface. Information Systems 17, 2, pp.147-159, 1992 [PPT91] J. Paradaens, P. Peelman, L. Tanca. G-Log: A declarative Graphical Query Language. Proc. 2nd Intl. Conf. on Deductive and Object-Oriented Databases, Munich, Germany. Springer LNCS 566,pp.108-128, 1991 [RiSc91] J. Richardson, P. Schwarz. Aspects: Extending Objects to Support Multiple, Independent Roles. Proc. of ACM SIGMOD 91 Conf., 1991, pp.298-307 [Schm77] J.W. Schmidt. Some high level language constructs for data of type relation. ACM Transactions on Database Systems, Vol.2, No 3, 1977, pp.247-261 [ScMa92] J.W. Schmidt, F Matthes. The Database Programming Language DBPL, Rationale and Report. FIDE, ESPRIT BRA Project 3070, Technical Report Series, FIDE/92/46, 1992 [SFL81] J.M. Smith, S. Fox, T. Landers. Reference manual for ADAPLEX. Technical Report CCA-81-02, Computer Corporation of America, 1981 [SRH90] M. Stonebraker, L.A. Rowe, and M. Hirohama. The Implementation of POSTGRES. IEEE Transactions on Knowledge and Data Engineering, 2:1, pp.125142, 1990. [SRLG+90] M. Stonebraker, L.A. Rowe, B. Lindsay, J. Gray, M. Carey, M. Brodie, P. Bernstein, D. Beech: The Committee for Advanced DBMS Function. ThirdGeneration Data Base System Manifesto. ACM SIGMOD Record 19(3), pp.3144, 1990. [Subi85] K. Subieta. Semantics of Query Languages for Network Databases. ACM Transactions on Database Systems, 10:3, pp.347-394, 1985. [SuMi86] K. Subieta, M. Missala. Semantics of query languages for the EntityRelationship Model. Proc. 5th Conf. on Entity-Relationship Approach, Dijon, France, pp.197-216, 1986. [SuRz87] K. Subieta, and W. Rzeczkowski. Query Optimization by Stored Queries. Proc. 13th VLDB Conf., Brighton, England, 1987, pp.369-380 [SMA90] K. Subieta, M. Missala, and K. Anacki. The LOQIS System. Institute of Coputer Science Polish Academy of Sciences Report 695, 1990. [Subi91] K. Subieta. LOQIS: The Object-Oriented Database Programming System Proc.1st Intl. East/West Database Workshop on Next Generation Information System Technology, Kiew, USSR 1990 Springer Lecture Notes in Computer Science, Vol.504, pp.403-421, 1991.
62
[SMSRW93] K. Subieta, F. Matthes, J.W. Schmidt, A. Rudlo, I. Wetzel. Viewers: A DataWorld Analogue of Procedure Calls. Proc. 19th VLDB Conf., Dublin, Ireland, 1993, pp.269-277 [Toya86] M. Toyama. Parameterized view de nitions and recursive relations. Proc. of Conf. on Data Engineering, Los Angeles, IEEE Computer Society, pp.707-712, 1986 [Ullm91] J.D. Ullman. A Comparison of Deductive and Object-Oriented Database Systems. Proc. 2nd Intl. Conf. on Deductive and Object-Oriented Databases, Munich, Germany. Springer LNCS 566, pp.263-277, 1991 [WaGo84] W.M. Waite, G. Goos. Compiler Construction. Springer 1984 [WKS89] W. Wilkes, P. Kahold, and G. Schlageter. Complex and composite objects in CAD/CAM databases. Proc.5th Conf. on Data Engineering, Los Angeles, California, pp.443-450, 1989. [WLH90] K.Wilkinson, P.Lyngbk, W.Hasan. The Iris Architecture and Implementation. IEEE Transactions on Knowledge and Data Engineering, Vol.2, No.1, 1990, pp.63-75 [WoYo76] E. Wong, K. Yousse . Decomposition - A Strategy for Query Processing. ACM Transactions on Database Systems, Vol.1, No 3, 1976, pp.223-241 [Zani83] C. Zaniolo. The Database Language GEM. Proc. ACM SIGMOD Conf. pp.423434, 1983. [ZhMe83] Z.G. Zhang, A.O. Mendelzon. A Graphical Query Language for EntityRelationship Databases. In: Entity-Relationship Approach to Software Engineering. North-Holland, Amsterdam 1983
63