Object-Oriented Standards: Can ODMG OQL be Extended ... - CiteSeerX

3 downloads 154 Views 273KB Size Report
We argue that bindings of OQL to universal programming languages (C++ ... by the object-oriented school. ... no legacy trade-o s, trying to address the quality of.
Object-Oriented Standards: Can ODMG OQL be Extended to a Programming Language? Kazimierz Subieta

Institute of Computer Science, Warsaw, Poland

Abstract

OQL is a query language proposed in the standard ODMG-93 as a tool for declarative access to object bases. We argue that bindings of OQL to universal programming languages (C++, Smalltalk, Java) must inevitably lead to the infamous impedance mismatch that was one of the major points of criticism of relational languages by the object-oriented school. This criticism to a big extent is not relevant now, as many relational and extended relational languages avoid the impedance mismatch by integrating programming constructs, for example the SQL3 standard, Oracle PL/SQL, or visual programming interfaces such as IBM VisualAge. As a remedy of this situation in the paper we discuss the integration of OQL with imperative programming constructs and abstractions in the spirit of the stack-based approach to object-oriented integrated query/programming languages.

1 Introduction

The object database standard ODMG-93 [9] is an important milestone in the development of object bases [3]. Perhaps the most essential contribution of it is establishing a concrete architecture and languages from the tremendous amount of general or fuzzy considerations and incompatible proposals. ODMG-93 represents a revolutionary approach, which abandons the ideological assumptions of the relational model. This is contrasted with the extended-relational approaches advocated in [10, 14, 26] and implemented in Starburst, UniSQL and Illustra, which retain some features of relational systems adding to them a lot of novel ones; the resulting solutions are perceived as eclectic, irregular and redundant, thus very dicult to standardize. The main strength of ODMG-93 is a common, non-eclectic, relatively minimal and universal model that has big chances to be adopted by wide research and industrial communities. ODMG-93 is intended to be a portability standard for some family of object systems (in contrast to the interoperability standard of OMG CORBA [18], which was the main pattern for the ODMG-93 object model), although the current speci cation causes doubts if a proper level of portability is feasible [13, 17, 22, 23, 24, 24, 32, 33]. OQL is a query language proposed in ODMG-93 as a tool for declarative access to object bases. OQL aims at the interactive querying of object databases and the  Proceedings of the International Symposium on Cooperative Database Systems for Advanced Applications, December 5-7, 1996, Heian Shrine, Kyoto, Japan, pp.546{555

enhancement of database application programming interfaces by very-high-level constructs. However, nowadays the interactive querying through ASCII-based query editors has rather few potential users. Interactive querying is much better supported by visual interfaces based on forms, menus, graphics and browsing. OQL is primarily intended for the high-level programming of database applications. An approach assumed in ODMG-93 is the traditional loose coupling, i.e., embedding OQL into the programming language C++ (Smalltalk binding is in infancy, Java is planned). We argue that this approach must inevitably lead to the infamous impedance mismatch that was one of the major points of criticism of relational languages by the object-oriented school. Facts are obvious: syntax is di erent, binding stages are di erent, programming style is di erent, name spaces are di erent, scoping is di erent, no bulk and persistent types in C++, no facilities for generic programing in OQL, no smooth parameter passing from OQL to C++ functions; etc. The C++ binding de nes classes which comprise query operators (e.g., union, intersection, di erence); they duplicate some of OQL functionalities following the C++ nomenclature. This determines OQL as an auxiliary language; it is implicitly assumed that the basic language is C++. However, C++ is not the best choice for programming database applications [1, 21]. It is argued (J.Sutherland, 1995) that in comparison to modern visual 5GLs the productivity of C++ programmers is lower ca. 10 times, and such aspects as portability, exibility, reliability, garbage collection, no memory leaks are considered ugly. (Smalltalk is much better, but its relevance to the programming of database applications is questionable.) For people familiar with SQL interfaces to PLs it is clear that considering OQL queries as strings passed to some C++ functions is far from the de nition of a complete interface. The criticism of the impedance mismatch actually works as a boomerang, as many relational and extended relational languages have changed the application programming paradigm by the development of integrated database programming languages, for example Oracle PL/SQL, Illustra, and visual programming interfaces such as CA OpenRoad or IBM VisualAge. The competitive SQL3 standard [2] has to specify a full programming language [15, 16]. Moreover, it is likely that vendors of object systems will conclude that OQL can be easily extended to make macroscopic updates in the style of SQL's cre-

ate, update, insert, delete, to de ne database semantic enhancements, such as views, database procedures and rules, and to de ne scripts for event-driven visual programming interfaces. Because OQL does not include the mentioned above extensions1 , this can be the source of the mismatch between di erent implementation and a killer of the portability. As a remedy of this situation, in the paper we discuss the integration of OQL with imperative programming constructs and abstractions in the spirit of the stack-based approach to object-oriented integrated query/programming languages proposed in [29, 30, 31, 33, 34]. Nowadays there are very few organizations or users that are ready to accept yet another programming language; thus majority of new developments (e.g. C++, Java, SQL3) rely on the authority of their predecessors. Unfortunately, in the case of OQL the technical and commercial criteria are contradictory. An attempt to rely on the authority of SQL (assumed for OQL) is perceived as a kind of mimicry, because SQL does not support object identity, a sine qua non feature for object bases. Moreover, SQL does not support complex objects, strong typing, classes, methods, inheritance, encapsulation, associative links, path expressions, dependent joins, i.e., those features that make up the object-orientation and are the substance of OQL. As a mater of fact, OQL adopts from SQL only its poorly composable, non-orthogonal syntax. Thus we make no legacy trade-o s, trying to address the quality of the technical speci cation rather than the compliance with commercial buzzwords and stereotypes. The design of a programming language based on a query language is a complex task. In this paper we are unable to give decisive solutions, but we will try to discuss these issues and guidelines that are crucial for the good development. Our experience concerns the development of an object-oriented system LOQIS [27, 28]. LOQIS is based on SBQL, a query language that is very similar to OQL, but more orthogonal, powerful and semantically clean. SBQL is seamlessly integrated with imperative statements, views, classes, procedures and modules. After this experience we developed a theory that we call the stack-based approach to integrated object-oriented query/programming language. In comparison to former database theories based on the relational algebra or logic (or their current extensions for object bases, e.g., object algebras, comprehensions, or F-logic) the stack-based approach is able to describe and explain consistently much wider range of semantic features. Because of clarity, regularity and universality of the semantic description it o ers a big scope for query/program optimization. In this paper we do not assume the reader's familiarity with this approach, trying to explain it in natural terms.

2 Making Order in the OQL Semantics

Semantics of OQL is mostly explained by examples or a description in a natural language, unfortunately, sometimes over-simpli ed or imprecise [32]. 1 Some of the extensions are postulated in [5, 6, 7] and implemented in O2 [19], which is a pattern for OQL.

Thus any attempt to integrate OQL with programming constructs and abstractions must be preceded by cleaning up all problematic semantic features. We have concluded that semantic problems of OQL are caused by the following main reasons: . Lack of a well-de ned model of object instances; . Lack of a de nition of internal structures of the semantic mechanism; . As a consequence of the previous: very imprecise de nition of scoping and binding rules, as well as iterations implied by some operators, e.g. the dependent join or grouping; . Lack of well-de ned domains of results returned by queries.

2.1 Formal Models of Objects and Values

In several cases the speci cations of OQL semantics refers to types, not to object instances. However, types do not bear all semantic information. For example, integer can be a type description of some constant, variable, parameter of a procedure, or result returned by an expression { but these are very di erent semantic entities. Types are important and should be discussed in a proper context, e.g. when one presents type inference rules induced by query operators or a static type checking mechanism; however, they should be considered secondary in the description of the semantics of OQL operators. The description of semantics needs precise formal models of run-time instances that OQL deals with. The models should address: (1) the notions of an object and objects' repositories (modules, classes, extents, etc.); (2) results returned by OQL queries; (3) procedure parameters. In the stack-based approach the formal model of an object re ects its identity, an external name that is used during writing a query/program, and its state. An object is a triple , where state is an atomic value, or a set of objects. The formal model of a query result is a table (bag) of tuples, where a tuple may consist of references to objects, atomic values, and special elements that we call binders [31]. [9] asserts that a query \delivers an object", but this is very imprecise: OQL queries return references to objects, some other values, and yet other some structures comprising references, values and auxiliary names (which were used in as clauses). The precise formal model of such structures is a prerequisite for clear OQL semantics.

2.2 Internal Structures of the OQL Engine

A next problem of the OQL semantics concerns internal structures that are necessary to de ne OQL operators. (We emphasize that we consider conceptual structures, not implementation.) Consider the following query with the use of a dependent join:

select c.address from Persons as p, p.children as c where c.address.city != p.address.city

The principle of formal semantics is called compositionality : as far as possible the semantics of some

syntactic construct should be described in terms of its sub-constructs and independently of the context in which the construct is used. (This is a basic principle of mathematical approaches to semantics, e.g. the denotational semantics.) Consider the construct p.children . Obviously, if one wants to assign to it an independent meaning, then the name p , de ned in the previous phrase, should be stored somewhere in an internal structure, together with the value that it actually assumes in a loop iterating over Persons. Hence the phrase Persons as p (followed by comma denoting the dependent join) creates a temporary variable named p storing a reference to some Person object. (The assumption that p is a variable whose value is the current element of an iteration is very ambiguous; for example, what will happen if one updates the variable? If we assume updating of Persons via p, the full picture is much more complicated, and cannot be reduced to considering p as some temporary variable.) This variable is just a part of the internal structure that should augment the semantic model of this query. Assume the construct c.address.city != p.address.city should also have the meaning independent of a context. In this case both names, c and p, together with their actual values (references) should be stored at the internal structures. We conclude that the compositionality principle requires for this query some internal structures that re ect both internal de nitions that are present in this query (names p and c) as well as iterations implied by some query operators such as select, where, and a dependent join. In the following we discuss how such structures should look like.

2.3 Binding and Scoping

A binding operation substitutes a name occuring in a program by an internal program entity or a reference to it. For example, a variable name X is substituted by a machine address where the variable's value is stored. The concept of binding is fundamental for PLs' semantics. We argue that it is also fundamental for a query language if one considers seriously the seamless integration of it with a programming capabilities. We have shown [31] that we can develop the semantics of a query language in such a way that each name occuring in a query (c, address, Persons, p, children, city in the above example) is the subject of the binding paradigm. It requires the introduction of conceptual structures storing an internal state of the query execution, such as the mentioned above variable p. Binding is related to the well-known concept of scoping . Having a name in a query/program that has to be bound, the whole space of program entities that are candidates for the binding is subdivided into parts, called environments . We can distinguish a local environment of a procedure, an environment of database objects, an internal environment of an object (its attributes), an internal environment of a class that the object belongs to, an environment of library procedures, etc. During execution of a query/program the environments are changing. Changes are caused by a lot of operators, for example, procedure calls, method invocations, select, where and other query operators,

for each iterator, etc. Scoping determines which environments and in which order should be visited during binding a particular name. This way of thinking is crucial for the whole domain of object-oriented query/programming languages and by no means can be ignored in the description of the OQL semantics. Because the object-oriented model involves such concepts as complex objects, a hierarchy of classes with inheritance, methods (with parameters), encapsulation, etc. an essential issue concerns the order of visiting particular environments during binding a name occuring in some query. For example, consider a query of the form:

select ... from Persons where ... X ...

and assume that the query is used in the body of some method m that is de ned within the class of Persons. X is some name, but of course the compiler or interpreter does not know a priori what it denotes. It must recognize that by a strict method. In this case the following environments should be visited during the binding of X, in the order that is speci ed below: . Visit attributes of the Persons object that is actually processed by the where clause; maybe X refers to one of them; . If not, visit the exported properties of the class (and possibly of its super-classes) that the object inherits; maybe X is a name of some method from this class; . If not, visit parameters and local variables of m; maybe, X is one of them; . If not, visit private properties of the class Persons, maybe X is some private procedure (or a class variable) called in m; . If not, visit global database objects, views and database procedures; maybe X refers to some of them; . If not, visit the global libraries of procedures, functions, template classes, and computer environment variables (date, time,...). In this scenario we abstract from the fact that some visits are performed during compilation and are optimized by types. The order of visits re ects the \context vicinity" of a name to be bound and follows the rule: rst visit the most local context, then go outside to wider and wider contexts. For other cases that may occur in the ODMG-93 object model the scoping/binding rules and the change environments in response to the execution of particular operators that occur in OQL queries is a non-trivial semantic issue. The stack-based approach presents a disciplined method of coping with it. It is centered around the scoping-binding theme. Each name occuring in a query is bound to run-time programming entities (persistent data, volatile data, methods, views, procedures, parameters of procedures, local procedure objects, library procedures and functions, etc.) according to the actual scope for the name. Scopes are organized in an environment stack with the \search from the top" rule. We have introduced necessary

modi cations to the structure of stacks used in PLs' de nitions, e.g., for query languages it is necessary to have multiple simultaneous bindings for a name and the separation of an environment stack from an object store. Fig.1 shows the object store and the environment stack together with the order of visits that re ects the above scenario. Auxiliary names, such as p and s in the presented OQL query, are also re ected on the environment stack; see [31] for details. We argue that the stack is necessary to explain precisely the conceptual semantics related to scoping and binding; it is not an implementation issue. ...

... ...

...

Binders to , ... attributes of the actually processed object Binders to ... exported and inherited properties of Binders to parameters and local variables of Binders to private properties of Binders to ... database objects. procedures, views Binders to global procedures, functions, classes, env.variables

Figure 1: Scoping and binding through the environment stack In [31] we have shown that the semantics of query operators such as selection, projection, navigation, dependent join, quanti ers, ordering, transitive closure, can be de ned in terms of operations on the environment stack (and the classical stack of results). After closer look on OQL we are convinced that a correct and precise description of OQL semantics also requires the environment stack. Note that the notion is wellknown in the PL domain for over three decades. It just re ects the \context vicinity" rule that is strongly relevant a lot of PLs and for query languages such as OQL. The stack-based approach makes it possible to explain consistently nesting queries into macroscopic imperative statements (creating, updating, inserting, deleting), integrating them with programming abstractions (procedures, views, modules, active rules) and parameter passing techniques, as well as to integrate them with object-oriented features (classes, encapsulation, inheritance, methods, roles). It also enables us to deal with irregular information (null values and unions) [33]. Hence it forms a promis-

ing universal theory of integrated object-oriented query/programming languages and, we believe, contributes a lot to the understanding of OQL semantics.

2.4 Mutable and Immutable

The ODMG-93 object model and typing system make the distinction between \mutable" and \immutable" objects. The idea is to distinguish between objects that can and cannot be updated. Our doubts concern some misunderstanding of semantics that is behind this distinction. The term \mutable" usually concerns parameters of procedures, not objects. It means that such a parameter is a reference to an object that is outside the procedure; the programmer is allowed to update this object through this parameter. The ODMG terminology mixes two di erent semantic features, namely, the possibility to update an object and the possibility to operate on a reference to an object. To make the semantics and terminology clear we propose to distinguish the following concepts: . Objects stored in some data repositories, for example, database objects, volatile objects, C++ objects, variables, etc. Objects constitute the state and can be updated. . Constants stored in some data repositories. Constants consitute the state, but they cannot be updated. (Sometimes, e.g. C and C++, they are textual macros, hence do not in uence the state.) . L-values or references that can be returned by OQL queries. Such references are used by updating statements (update, delete, etc.) and as call-byreference parameters of procedures. . R-values or simply values that can be returned by OQL queries. Values can be atomic (e.g., integer) or complex (tuple, record, etc). Such values are referred to as \immutable", as they cannot be used on left side of an assignment, as arguments of a delete statement, etc. The ODMG concept of a complex value causes doubts [32]; for the space limit we skip the discussion. Queries in such languages such as OQL may return structures that combine L-values and R-values. For example, the query select x, (x.sal { 100) from Persons as x returns a structure consisting of pairs, where the rst element is a reference to a Person object, and the second one is an integer value. Such structures cannot be precisely re ected by the ODMG typing system, since they comprise both \mutable" and \immutable" elements. If one assumes that such queries can be used as parameters of procedures or methods, then there is a room for several decisions concerning parameter passing (e.g., some combination of call-by-value with call-by-reference ).

2.5 Identi ers of Attributes

The ODMG model assumes that attributes have no identi ers. We consider that another doubtful feature making diculties in integrating OQL with programming features. Assume we have to develop the assignment, in the style of the SQL update statement:

update Persons where sal < 1000 set sal = sal +100

In PLs every updating (assignment) requires to return an L-value, i.e., a reference to an updated unit. In this case we need a reference to a sal attribute. If attributes have no identi ers then we are unable to build references to them, hence the semantics of updating becomes problematic. In the ODMG model attribute identi ers can be constructed as pairs ; however, this assumption is inconsistent for set-valued attributes, and is a bit clumsy for sub-attributes, subsub-attributes, etc. It is also possible that there is simply a terminological problem: ODMG identi ers denote persistent object identities, while identi ers of attributes have only the meaning for a particular implementation, for example, they are physical locations, OIDs+o set, etc. However, the statement that attributes have no identi ers can be interpreted as impossiblity to build a reference to an attribute value. In our opinion, this could be a source of serious problems with the de nition of correct semantics of updating. In the next section we remind two PLs' principles, the store principle and the semantic relativity principle . Lack of internal identi ers for attributes violates both of them. For example, if attributes have no identi ers then locking granularity forbids locking a single attribute value. This could be unacceptable because of the required level of concurrency. According to the second principle, every nested environment should have the same semantic properties as its parent environment. An internal content of an object is such a nested environment. For example, assume that within a Person object there is a repeating complex attribute JobHist:

set

Consider the query: nd persons who used to work for IBM for at least 10 years. The query can be constructed in two steps: (1) Within an internal object environment we calculate how long a person worked for IBM: sum(select x.Till { x.From from JobHist as x where x.Comp = "IBM") (2) Then the query is nested into the whole query:

select y.Name from Persons as y where sum(select x.Till { x.From from y.JobHist as x where x.Comp = "IBM") >= 10

Semantic di erences between objects and attributes causes that semantics of nested and parent queries should be expressed in di erent terms. This means unnecessary growth of the size of the OQL de nition, manual, and implementation e ort. The semantic relativity and orthogonal persistence [4] imply the admission of persistent objects that have only atomic values and no atributes.

2.6 Side E ects of Queries

Although OQL has been designed as a purely retrieval language, a very strange decison was made concerning queries of the form Person( name:"Pat", sal:1000) Such a query creates an object, i.e., it changes a database state. Simultaneously, it returns (a reference to) the object; thus can be combined with other queries. Although some programming languages make similar decisions concerning side e ects of expressions (the best examples are C and C++, where even assignments are side e ects of expressions), we argue that it is a doubtful feature for very-high-level languages. In lower level languages the feature might be justi ed by performance, however, this is not the case for query languages, where side e ects of queries makes majority of known query optimization methods inapplicable. There are also conceptual disadvantages. Because such queries can be combined with other queries the resulting semantic e ect of them can be very dicult to understand. Lack of clear separation of \expressions" (i.e. queries) and \statements" (i.e. imperative commands) makes the construction of the language obscure in many cases and causes unnecessary growth of the size of manuals and the time of learning. Such queries mix two di erent semantic features, hence introduce unnecessary conceptual disorder. In our opinion, creating new objects should be introduced on the top of OQL, as one of several imperative constructs acting on OQL queries.

2.7 Auxiliary Naming

Assume two classes Dept(dname) and Emp(name,sal) connected by links employs ; dname of Dept is not necessarily unique. Consider the query:

\For each department return its name and the names of its employees earning the highest salary in their departments". In SBQL it is an easy query: Dept.(dname, (m denotes max(employs.Emp.sal)). (employs.Emp where sal = m).name) The query displays several problems in the OQL syntax and semantics. One is the SQL-like syntax which makes such queries awkward. Another one is the group by construct, which is poorly speci ed and not suciently universal; e.g. no grouping according to references to objects makes it impossible the application of grouping to this query. (In LOQIS we avoid grupping because it appears to be covered by the dependent (navigational) join, which is orthogonal to other query operators.) The query can be solved by introducing an auxiliary name, as in the above SBQL example; unfortunately, auxiliary naming in OQL is a feature with imprecise semantics thus there is no evidence if some solution is correct. The OQL operator as occurs in two contexts (in from and select clauses), perhaps, with two di erent meanings. We suggest that (as in SBQL) the auxiliary naming should be speci ed precisely and uniformly, and should be allowed in all contexts of a query [31].

3 Seamlessly Integrated OQL/PL

In the following we shortly discuss features which might be introduced to OQL integrated with programming capabilities (OQL/PL). Then we present some guidelines that should be observed during the design.

3.1 Features of OQL/PL

The most natural way to extend OQL to a programming language is to follow the current SQL paradigm, which assumes that queries are components of imperative statements such as update, insert and delete . Following the orthogonality and minimality principles, OQL queries should play a part of generalized program expressions , which comprise classical expressions, e.g., 2 + 2 or X  A[n + 1], but can also return complex structures consisting of references, values and names. Such queries can be used within imperative macroscopic statements: assignments, creations of data objects, deletions, insertions. \Macroscopic" means that the statements act on a structure determined by a query in a quasi-parallel way. For example, we can propose the following SQL-like updating statement: update query1 set fquery2 = query3;...g In this syntax query1 returns references to objects that have to be updated, query2 returns a reference to some of their attributes, and query3 returns a value. For example:

update select * from Emp as x where x.sal < 1000 set f x.sal = avg(Emp.sal ); x.status = "OK"; g

Note that a combination of two imperative words update and select (and other sugar) looks unnatural (sometimes even ugly); compare the following syntax (derived from LOQIS): (Emp where sal < 1000). (sal := avg(Emp.sal ); status := "OK" ); Such statements can be nested into standard control statements: if...then...else, case, while...do, for each...do , etc.; for example:

for each x in select y from Emp as y where y.sal < 1000 do f print(x.name ); delete x; g ;

OQL queries and statements based on them can be used as building blocks for procedures, functional procedures, views, methods, virtual attributes, integrity constraints, active rules, and ADT operators. It should be possible to de ne a procedure, where either a parameter, a body, and an output are determined by queries. For example, the following procedure returns a structure (N,S,D ) consisting of references to name , sal and department names of employees of the given jobs that earn less than the average (the syntax is ad hoc): OutType Underpaid( JobsType j ) f short a; a = avg(Emp.sal); return select x.name as N, x.sal as S, (select y.dname from x.works in as y) as D

g;

from Emp as x where x.job in j and x.sal < a ;

The procedure can be used as a parameterized view: select z.N from Underpaid("clerk") as z where z.D = "Sales" Because the view returns references to attributes it can also be used for updating:

update select * from Underpaid("clerk" union "designer") as t where t.D = "Toys" set t.S = t.S + 100;

Note that the query within the procedure Underpaid uniformly combines persistent objects, a local variable, and the procedure's parameter. This is just the feature of LOQIS, very di erent from OQL, which deals with database objects only. As seen, a view is simply a functional procedure. For view updating some additional features are necessary; this is a subject of our present research. A method is a procedure or a functional procedure that is stored within a class and executed in the environment of a class instance. Below we present the method increase sal which checks the budget and then increases sal for an Emp object that is the receiver of a message increase sal(rise): boolean increase sal( in short rise ) f boolean Allowed; Allowed = sum(Emp.sal) + rise

Suggest Documents