Query Language for Complex Database Objects - CiteSeerX

University of Ljubljana Faculty of Electrical Engineering and Computer Science

Iztok Savnik

Query Language for Complex Database Objects Doctoral dissertation

Thesis supervisor: Doc.Dr. Tomaz Mohoric

Ljubljana, 1995

Abstract In this thesis, we address the problems of the design of an object algebra whose operations are intended to form the basis of OVAL, a functional query language for objects. The proposed algebra is closely related to the object-oriented database model. The algebra and the corresponding functional language are designed to be suitable for integration with an object-oriented database programming language. The main contributions of the work are: (i) the de nition of the object algebra, (ii) the design of a functional operation for querying nested components of complex objects, (iii) the de nition of a set of operations for querying database schema, and nally, (iv) the design of a procedure for static type checking of OVAL queries. In order to provide the basis for the de nition of the object algebra operations, we describe the object-oriented model in a formal manner. The proposed object-oriented model formalization uni es the schema and the instances of the database. Consequently, the database is perceived as a uniform set of objects. This view allows a clear and uniform formal treatment of the basic features of the data model and a de nition of simple operations for querying database conceptual schema. The behavior of objects is described using methods. The salient feature of the formalization is the restriction of attribute and method overriding, which serves as a necessary property for the de nition of the type-checking procedure for OVAL queries. The operations of the proposed object algebra derive from relational algebra, nested relational algebra and functional languages. We believe that complex composite objects and the ability to inquire about the conceptual schema are not adequately supported by recent object algebras. Accordingly, we propose the use of a higher-order function apply at for querying nested components of objects. The proposed operation can apply any query to an arbitrary nested component of the object. Next, we de ne a set of primitive operations for querying database conceptual schema. These operations are de ned by means of constructs provided by the data model formalization. The operations of the object algebra can be combined into i

queries by the function composition operation. The derived query language can be treated as a generalization of the FQL [16] functional query language for the manipulation of the objects. The integration of the proposed functional query language and an object-oriented database programming language is studied by a prototype implemented as an extension of the E database programming language. The functional nature of OVAL makes the language suitable for integration with procedural languages. In addition, the OVAL syntax and semantics stimulate the construction of queries in a step-by-step manner, which we nd appropriate for expressing queries on complex objects. Such a method of query de nition forces the programmer to split a complex problem into a sequence of simpler subproblems, whose solutions can be readily composed in a more complex query. In comparison to SQL-based object-oriented query languages, the proposed query language oers a more procedural solution which is also easier to understand. The type checking of OVAL queries and some other properties of the object algebra are studied in a Prolog-based prototype. This implementation is not close to any particular type system, as is for example the previously mentioned implementation, and is exible enough to allow experimentation with the designed language. The type checking procedure is de ned in the form of a set of type checking rules. The procedure can derive the type of a query result statically in the presence of the substitutability principle by respecting the constraints imposed by the OVAL data model.

Keywords: databases,

object-oriented databases, database models, conceptual models, data model formalization, complex objects, database algebras, object algebra, query languages.

ii

Zahvala Najprej se zelim iskreno zahvaliti mentorju doc. dr. Tomazu Mohoricu za strokovno pomoc in koristne nasvete pri izvedbi disertacije. Zahvaljujem se tudi komentorju prof. dr. Bostjanu Vilfanu za koristne pripombe k osnutku disertacije. Vsem sodelavcem Odseka za racunalnistvo in informatiko na Institutu \Jozef Stefan" sem hvalezen za pomoc med delom na doktorski nalogi. Doc. dr. Franc Novak mi je pomagal pri izvedbi disertacije ter me vrsto let vodil pri delu na Institutu \Jozef Stefan", zato se mu iskreno zahvaljujem. Za delo na izvedbi jezika OVAL, stevilne koristne pogovore ter za vzpodbujajoce navdusenje pri delu na jeziku OVAL se zahvaljujem Vanji Josifovskemu. Alenki Z uzek sem hvalezen za pomoc pri zakljucnem delu na doktorski nalogi. Prav tako sem hvalezen vsem sodelavcem med studijskim obiskom na School of Information Systems, Queensland University of Technology, za pomoc pri delu na doktorski nalogi. Dr. Zahir Tariju se zahvaljujem za vodenje pri delu na objektni algebri. Brez njegove podpore in znanja naloge ne bi dokoncal v taksni obliki. Prof. dr. Mike Papazoglou sem hvalezen za nasvete in pomoc glede dela na disertaciji ter za prijetno delovno okolje med studijskim obiskom. Najtopleje bi se rad zahvalil tudi starsem, sestri in Vanji za podporo in pomoc. Institut "Jozef Stefan" je omogocil ustrezno okolje za raziskovalno delo. Delo na doktorski disertaciji je nanciralo Ministrstvo za znanost in tehnologijo Republike Slovenije.

iii

Acknowledgements First of all I would like to express my thanks to my supervisor, Doc. Dr. Tomaz Mohoric for supervising me and giving me much advice during my work on this thesis. I also thank co-mentor, Prof. Dr. Bostjan Vilfan for the useful comments on the draft version of this thesis. I thank colleagues from the Computer Science Department at Jozef Stefan Institute for their help during the work on this thesis. First, I am thankful to Doc. Dr. Franc Novak for helping me with the nal version of this thesis and for guiding me through my work at Jozef Stefan Institute. For his work on the OVAL implementation, many useful conversations and his encouraging enthusiasm in the work on OVAL, I am grateful to Vanja Josifovski. Finally, I thank Alenka Z uzek for helping me to make the nal corrections of thesis. I would like to express my thanks to colleagues from the School of Information Systems, Queensland University of Technology for their kind support during my visit. Most of all I am thankful to Dr. Zahir Tari for guiding me through the work on algebra during the last year. Without his support and knowledge, this work would not have reached its current stage. For his advice concerning the work on my thesis, and for giving me the opportunity to work in a stimulating working environment, I am grateful to Prof. Dr. Mike Papazoglou. I would also like to express my gratitude to my parents, sister and Vanja for their support and help. Jozef Stefan Institute provided the convenient environment for the research work. The work on this thesis was nancially supported by the Ministry of Science and Technology of Republic Slovenia.

iv

Contents Abstract Zahvala Acknowledgements

i iii iv

1 Introduction

1

1.1 Motivations : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1.1.1 Database algebras and database programming languages 1.1.2 Uni ed view of a database : : : : : : : : : : : : : : : : : 1.1.3 Querying nested components of objects : : : : : : : : : : 1.2 Outline of thesis : : : : : : : : : : : : : : : : : : : : : : : : : : :

2 Background and related work

2.1 Data models : : : : : : : : : : : : : : 2.1.1 Object-oriented database model 2.1.2 The O2 data model : : : : : : : 2.1.3 The IQL data model : : : : : : 2.1.4 F-Logic data model : : : : : : : 2.1.5 EXTRA data model : : : : : : 2.2 Database algebras : : : : : : : : : : : : 2.2.1 Relational algebra : : : : : : : : v

: : : : : : : :

: : : : : : : :

: : : : : : : :

: : : : : : : :

: : : : : : : :

: : : : : : : :

: : : : : : : :

: : : : : : : :

: : : : : : : :

: : : : : : : :

: : : : : : : :

: : : : : : : :

: : : : : : : :

: : : : : : : :

: : : : : : : : : : : : :

: : : : : : : : : : : : :

: : : : : : : : : : : : :

: : : : : : : : : : : : :

: : : : : : : : : : : : :

: : : : : : : : : : : : :

: : : : : : : : : : : : :

4 4 5 6 8

10 10 11 13 14 16 17 17 18

2.2.2 Nested relational algebra : : : 2.2.3 Encore/Equal algebra : : : : : 2.2.4 Complex Object Algebra : : 2.2.5 LDM algebra : : : : : : : : : 2.3 Functional database query languages 2.3.1 FQL : : : : : : : : : : : : : : 2.3.2 GDL and O2FDL : : : : : : : 2.3.3 FAD : : : : : : : : : : : : : :

3 Data Model Formalization

3.1 Introduction : : : : : : : : : 3.2 Structural model : : : : : : 3.2.1 Object and o-value : 3.2.2 Classes and objects : 3.2.3 Types and classes : : 3.3 Modelling behavior : : : : : 3.3.1 Signature properties 3.3.2 Signature poset : : : 3.4 Concluding Remarks : : : :

: : : : : : : : :

: : : : : : : : :

: : : : : : : : :

: : : : : : : : :

: : : : : : : : :

4 Object Algebra

: : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : :

4.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : 4.2 Example : : : : : : : : : : : : : : : : : : : : : : : : : 4.3 Model-based operations : : : : : : : : : : : : : : : : : 4.3.1 Valuation operator : : : : : : : : : : : : : : : 4.3.2 Extension functions : : : : : : : : : : : : : : : 4.3.3 Comparison operations based on o-value poset 4.3.4 Closure operations : : : : : : : : : : : : : : : vi

: : : : : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : :

: : : : : : ::

: : : : : : : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : : : : : : :

19 20 21 22 23 23 24 25

26 26 27 28 29 31 36 37 38 40

41 41 43 45 45 46 46 47

4.3.5 The nearest common superclasses and subclasses 4.3.6 Equality : : : : : : : : : : : : : : : : : : : : : : 4.4 Algebra operations : : : : : : : : : : : : : : : : : : : : : 4.4.1 Apply : : : : : : : : : : : : : : : : : : : : : : : : 4.4.2 Selection : : : : : : : : : : : : : : : : : : : : : : : 4.4.3 Set operations : : : : : : : : : : : : : : : : : : : : 4.4.4 Close : : : : : : : : : : : : : : : : : : : : : : : : : 4.4.5 Tuple : : : : : : : : : : : : : : : : : : : : : : : : 4.4.6 Group : : : : : : : : : : : : : : : : : : : : : : : : 4.4.7 Unnest : : : : : : : : : : : : : : : : : : : : : : : : 4.4.8 Querying object components : : : : : : : : : : : : 4.5 Expressing Operations of Other Algebras : : : : : : : : : 4.6 Concluding remarks : : : : : : : : : : : : : : : : : : : : :

5 Prototype Implementations

: : : : : : : : : : : : :

5.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : 5.2 Prolog based prototype : : : : : : : : : : : : : : : : : : : : 5.2.1 Examples : : : : : : : : : : : : : : : : : : : : : : : 5.2.2 Type checking : : : : : : : : : : : : : : : : : : : : 5.3 Extending E database programming language : : : : : : : 5.3.1 Integrating databases and programming languages : 5.3.2 E database programming language : : : : : : : : : 5.3.3 Integrating OVAL and E DBPL : : : : : : : : : : : 5.4 Conclusions : : : : : : : : : : : : : : : : : : : : : : : : : :

6 Conclusions

: : : : : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : : : : :

48 49 50 51 52 54 55 56 57 57 60 63 66

68 68 68 69 71 74 74 75 76 78

79

6.1 Summary : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 79 6.2 Contributions : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 81 vii

6.3 Further work : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 82

Bibliography

83

A Razsirjeni povzetek

91

B Type checking rules

101

C Kratek pojmovni slovar

105

viii

Chapter 1 Introduction Recent object-oriented database management systems (OO DBMS) [27, 19, 57, 29] demonstrate that newly developed database technology actually subsumes the functionality of relational DBMS. They provide the ability to store huge amounts of data in a distributed environment, oer a query language which in most cases subsumes SQL or QUEL relational equivalents and provide an instance of a database programming language closely integrating a high-level programming language with the database. The data modelling and language facilities of OO DMBS allow ecient representation and manipulation of complex objects occurring in various engineering programming environments, including business, design, production, oce automation and others. There are a number of unresolved research problems relating to OO DBMS. The objectoriented data model still lacks clear formalization [14], although there have been many attempts in this direction [48, 3, 52, 43, 14, 39]. The formalization could serve as a tool for further study of the data model and corresponding query language properties. The inability to de ne an appropriate formal basis of the object-oriented model stems from its complexity. It integrates constructs for the description of object static structure and behavior, and is based on certain principles, including inheritance, polymorphism, encapsulation and property overriding [9, 15]. While the main features of the object-oriented data model are relatively widely accepted [43], the main characteristics of object algebra have not yet been identi ed. Recently proposed algebras [5, 46, 82, 70, 52, 50, 77, 80, 34] are quite varied in approach and sometimes dier signi cantly in their basic sets of algebraic operations. This may be due to the dierent starting points of the algebras, if we suppose that the target algebraic structures are xed 1

i.e. complex objects [9, 15] structured by means of tuple, set or similar type constructors1 . The algebras which have the strongest in uence on object algebra and which mainly serve as the starting points for the design of the object algebra are: relational algebra [21], nested relational algebra [62, 1] and functional languages [10, 33]. In this thesis, we address the problems of the design of the object algebra the operations of which are intended to form the basis for OVAL, a functional query language for objects. We claim that some aspects of the object-oriented model, in particular complex composite objects and the database conceptual schema2, are not adequately supported by recent object algebras. The following aspects are studied:

the formalization of the object-oriented data model, which would assist in the de nition of the object algebra,

the operations for querying nested components of complex objects, the use of database schema for querying the database, and the design of the type checking algorithm for query expressions. The developed formalization of the object-oriented data model extends the ideas of the IQL [3] and O2 [48] object-oriented data model formalizations by treating classes in the same way as ordinary objects. The database can be now perceived as a uniform set of objects, including class objects and ordinary objects. The advantage of such formal treatment of a database is the ability to use the conceptual schema and the instance parts of the database within the same formal view. Furthermore, this uni ed view of the database allows a definition of the set of operations for querying the schema part of the database. From the data model formalization point of view, treating classes as ordinary objects establishes close integration of the class and the type concepts. The structural part of the standard notion of a type [18], for example, is de ned as a structured value whose components are class objects. Next, the type interpretation can be de ned as an extension of the class interpretation. The behavior of objects is described using methods. The properties of methods are described by signatures, which serve to model the method interface. We describe the signature interpretation, behavioral inheritance, method overriding and multiple inheritance We are not concerned with database algebras intended to manipulate the so-called bulk data structures i.e. trees, graphs and similar. 2 The database conceptual schema or database schema is a logical description of a database using the constructs of a data model. In the context of object oriented database, the conceptual schema is speci ed by a set of classes and corresponding types. 1

2

con icts. One of the salient features of the data model formalization is restriction of the attribute and method overriding, which serves as a necessary property for the de nition of the type-checking procedure for OVAL queries. The object algebra is designed in accordance with the data model formalization. The object algebra is based on the relational and nested relational algebras. As we have already said, the dierences between object algebra and relational algebras are mainly due to the more expressive power of the object-oriented data model. The operations of the object algebra are divided into model-based operations and basic algebraic operations. The model-based operations are used for manipulating object properties described by data modelling constructs. These operations are based on the concepts introduced by the data model formalization. The model-based operations serve as a tool for manipulating object values, for comparing objects, and for manipulating database conceptual schema. The operations allow a simple means of browsing through conceptual schema and relating instance objects to the conceptual schema. The basic algebraic operations are used for querying, restructuring and altering the contents of objects stored in a database. These operations use model-based operations as a tool for manipulating object properties. The algebraic operations are designed to be readily understood and to cover the user needs for querying a database. Every basic algebraic operation is a function which can be combined in queries using a function composition operation. In this way, we obtain an algebraic query language, the functionality of which is similar to SQL based object-oriented query languages [11, 29, 42]. The salient features of the proposed algebra and the functional query language are as follows:

Due to the functional semantics of the algebraic query language, it is appropriate for integration with a database programming language based on C++ or the like.

The OVAL syntax and semantics stimulate the construction of queries in a step-by-step

manner, which we nd appropriate for expressing queries on complex objects. Such a means of query de nition forces the programmer to divide a complex problem into a sequence of simpler subproblems, the solutions of which can be simply composed in a more complex query.

A simple means of querying nested components of composite objects is provided by a new operation apply at. The suggested operation for querying object components is 3

a generalization of the well-known function ApplyToAll3 originally used in functional programming languages [10].

Finally, the information that relates to the conceptual schema of the database can be utilized using the previously mentioned model-based operations.

1.1 Motivations The motivation for the work on the formalization of data model and the design of object algebra, outlined brie y in the introduction, is described in more detail in this section. The motivations can be roughly split into three main aspects. Firstly, we aimed to design a database query language which could be used for integration with existing database programming languages based on C++. Secondly, one of the main advantages of recent object-oriented models over relational and nested relational models [62, 1] is the ability to represent complex structural objects. We claim that alternative approaches to existing algebraic operations for querying the nested properties of complex structural objects should be suggested. Thirdly, due to the rich set of data modelling constructs for representing the structure of stored objects, we consider that the object algebra should include operations for querying database conceptual schema. A more detailed argument in favor of the stated motivations for the design of the object algebra is given in the following three sections. Finally, we mention that most approaches to the design of object algebras [81, 70, 5, 46, 50, 52, 77] do not agree in the basic operations of the algebra. Therefore, it is also interesting to investigate the common principles of existing object algebras and to de ne a set of operations that would cover most of the existing approaches.

1.1.1 Database algebras and database programming languages One of the initial motivations in the design of OVAL object algebra is its use as the basis for a declarative query language. The declarative language should be appropriate for integration with database programming languages based on C++, e.g. the database programming language E [23]. From this point of view, the algebraic language has certain advantages over declarative languages based on logic e.g. F-Logic [41]. One of these is the functional nature of the language, which ts with the syntax and semantics of a database programming 3

The function ApplyToAll is also known by the names maplist [33], set apply [82], replace [5], etc.

4

language. However, the same advantage has been claimed to make a language less declarative than query languages based on logic [33]. The functional nature of the algebraic query language, together with the ability to compose algebraic operations using a composition operator, allows extension of the semantics of path expressions to serve as the platform for the query language syntax. In addition, the algebra has the property of referential transparency [33] which allows the query to be rewritten in many dierent forms. This property can be exploited to optimize the performance of query evaluation. The referential transparency can be even more apparent by aiming to design algebraic operations that are used in a declarative language: we do not intend to design the set of minimal operations; instead, the functionality of some sets of operations may overlap to provide easier construction of queries.

1.1.2 Uni ed view of a database Object-oriented database systems store information about the modelling environment in terms of data objects and conceptual schema. Due to the rich modelling constructs available through the object-oriented data model, the conceptual schema of the object-oriented database is more expressive than the relational database schema. We observe that, comparing an object-oriented database to a relational one, some information about the modelling environment has been moved from the "instance" part to the "schema" part of the database. Let us now justify our previously stated premise using some examples. The classi cation properties of objects are presented in the relational database model using one or more attributes, the values of which de ne the class to which the object belongs. In the object-oriented model, we represent the same situation by creating the inheritance hierarchy of classes. Let us consider the classical example of classi cation of person into classes student, employee, etc. To inquire about the classi cation properties of the particular object, one must relate to the inheritance hierarchy of classes. A question could be: does object jim belongs to the class student. Next, we would like to lter the set of objects so that it includes only objects that are below or above some class in terms of the inheritance hierarchy. For example, select all objects that are instances of classes more general than or above the class phd student. In this way, we obtain only instances of classes student and person. The common feature of these queries is that they relate properties of data objects to database schema. Next, some aspects of object composition abstraction can be represented using data 5

model constructs. In the pure object-oriented database model [9], composite objects are modelled by object identi ers and complex attributes. The EXCESS database model [81] introduces the own and ref constructs for expressing the semantics of composite objects. The ref construct denotes the reference between objects. The own construct can represent the fact that a given component object is actually part of exactly one composite object. It can not be referenced by any other database object. Furthermore, in some data models the composition of objects is modelled explicitly, e.g. the SAM and OSAM* data models [75, 76], the XOOS data model [51] and the CDM data model [63]. These models provide the ability to make a semantic distinction between composition or aggregation relationships and associative links among objects. The user should be able to inquire about the composition structure of objects. Such facilities would be needed where the model provides constructs which enable modelling composition structure explicitly, and where this information is not provided by the modelling constructs. From the previous discussion we conclude that in a rich modelling environment, the schema part of the database has to be treated in a similar manner as the data part of the database: it is, like ordinary data, the subject of the user's inquiry and modi cation. One way to provide simple access to the schema information is to design a data model that uni es the perception of the schema and the data part of the database. From the query language point of view, we can reach the following two conclusions. First, we should be able to relate instances to the schema information, so that we can use object properties pertaining to its relationship to the conceptual schema. Second, due to the frequently very complex conceptual schema, the user would like to query it in order to obtain a precise mental image of the structure and the behavior of stored information [58]. Therefore, a query language should include constructs that allow the user to query conceptual schema.

1.1.3 Querying nested components of objects Another consequence of the rich set of modelling constructs of the object-oriented data model is the ability to de ne objects with complex composition structure. Values in objectoriented database can be structured by the arbitrary use of set and tuple type constructors. As a consequence, the object algebra must provide operations that can access and query nested properties of complex values. By accessing the nested properties of complex objects, we mean to obtain the value of the object components. This can be accomplished using 6

path expressions [15] or quanti ed path expressions [42]. By querying nested properties we mean ltering, restructuring or performing other operations on sets that are nested in the object value. Henceforth, we will dier between (i) querying nested sets which results in the modi ed argument object, and (ii) querying nested sets which results in the modi ed nested component itself. Recent algebras provide more ways to perform operations on nested object components. Firstly, the uniform structure of nested relations can be utilized to de ne operations that manipulate nested properties. The relational algebra operations are extended in some NF 2 algebras [62, 1, 68] so that they are recursively evaluated at all levels of nested relation. In the rst step, the operation is performed on the outer level of the object. Afterwards, its evaluation is recursively transferred to the nested relations. Secondly, the NF 2 restructuring operations provide the means to atten the nested structures. The desired operations can be performed on the unnested structures. The results can be again nested to obtain the original structure of the relation. Still, to be able to restructure relations without any loss of information, they must obey some additional properties [62, 35]. Another approach to selective access to the nested components of relations is to specify the path to the nested structure before every operation [22]. In this way, the operation evaluation is recursively transferred to the speci ed component of the relation. Most object algebras provide a rich set of operations for restructuring structured values, including nested relational nest and unnest operations, atten operation used for attening a set of sets, and group operation used for grouping objects from the argument set [70, 50]. As in the case of nested relational algebras [62, 1], restructuring operations can be used to access nested properties of object values. However, all disadvantages identi ed in their use for restructuring nested relations are also apparent in the object algebra. In addition, group and atten are not "inverse" operations in the sense that the operations nest and unnest are: it is not possible to restore the original state after unnesting the set of sets, unless one or more special attributes are dedicated to capturing information about the particular grouping. Further, the use of restructuring operations as part of the declarative language for accessing nested properties of object values is tedious, since it requires rst unnesting the target component, applying a desired operation and then again nesting the modi ed component. Another approach to the manipulation of nested components of the argument complex object is used in the Complex Object Algebra (COA) [5]. Object restructuring and access to the nested properties of complex value are provided by the higher-order function replace, 7

which realizes the implicit iteration on elements of the argument set. The parameter function is applied to each element of the input set. The parameter function is de ned by the functional notation called replace speci cation, which subsumes all de nable queries of COA. Basically, there are two ways in which the queries can be applied to the nested components of argument set elements. Firstly, the argument function of the replace operation can represent a structured query, i.e. a structure constructed by means of tuples and sets containing nested queries. This structured query then serves as a reconstruction plan which in addition de nes the structure of the resulting set of objects. In order to manipulate nested components of the argument object, its structure has to be reconstructed up to the manipulated nested component. Secondly, queries can be applied to the nested components by nesting replace operations. In this case, the structure of the resulting objects always consists of one or more level nested sets that contain components which were the target of a query evaluation. In general, the structure of the argument object is not retained when nested replace operations are used. These two approaches can be combined. From the previous discussion, we conclude that alternative suggestions for the operations which provide access to and manipulation of nested components should be investigated. Of particular interest are simple declarative constructs that allow manipulation of the nested properties of an object and are appropriate to serve as a query language construct.

1.2 Outline of thesis Some background to and related work concerning the OVAL data model, algebra and query language are given in Chapter 2. The main features of the object-oriented database model and the formalizations of some database models which had the strongest in uence on the formalization of the OVAL data model are presented. Next, the most in uential recent database algebras are presented. First, an overview of relational and nested relational algebras is given. Further, we present some of the established algebras for objects and their relations to OVAL algebra. At the end of this chapter, we brie y describe the family of functional query languages actually used as the basic platform for the design of the OVAL algebra and query language. The OVAL query language can thus be treated as a generalization of the functional query language FQL [16] for the manipulation of objects. In Chapter 3, the formalization of the object-oriented database model is presented. Particular attention is paid to the design of the structural part of the database model. The formalization is built on a simple basis: the class objects are treated as ordinary objects. 8

In accordance with this view, other data model concepts are de ned. We de ne the structural part of the type and the partial ordered sets of object identi ers and values. The interpretations of basic modelling constructs i.e. classes and types are given, and the properties of structural inheritance are presented. The behavioral view of the model is presented separately. Object behavior is described by means of methods which are represented using signatures. Some properties of signatures are given, and the behavioral inheritance is presented. The chapter nishes with a description of some areas for further work and some concluding remarks. Chapter 4 presents the object algebra and corresponding functional query language OVAL. The properties of the algebra are viewed as a re ection of the object-oriented model characteristics. In particular, we study the consequences of the use of the inheritance principle and the ability to de ne complex composite objects. Each algebra operation is presented separately using a formal de nition and a set of example queries. Particular attention is paid to: (i) the description of a new operation apply at, which serves for querying nested components of composite object, and (ii) the description of operations intended to query schema information. In Chapter 5, we brie y present two prototype implementations of the OVAL query language. First, the purpose and some results of the Prolog- based prototype are given. We describe the static type checking procedure for OVAL queries. The type checking procedure is de ned in the context of the use of the substitutability principle. Only some rules are presented here; the complete set of type checking rules is presented in Appendix B. The second prototype is built as an extension of the database programming language E [23]. A brief overview of the E database programming language is given. Next, some aspects of this integration are presented and some implementation details are given. Finally, in Chapter 6 we summarize the work and the contributions of this thesis, and present some areas for further work on the data model and query algebra.

9

Chapter 2 Background and related work In this chapter we describe some background of the object-oriented data models and algebras, and present the data models, algebras and query languages which had the strongest in uence on the design of the OVAL data model formalization and functional query language. The chapter is organized as follows: in the following section, the main concepts of the object-oriented database model are described and some recent database models and their formalizations are brie y overviewed. In Section 2.2, we rst present the relational and nested relational algebras. Afterwards, some properties of database algebras related to the OVAL object algebra are overviewed. Finally, the OVAL query language belongs to the family of functional query languages. Brief descriptions of the functional query languages FAD [25], FQL [16] and some extensions of FQL are given in the last section.

2.1 Data models The data model is a collection of conceptual tools for describing data, data relationships, data semantics, and data constraints [45]. Data models can be divided into three main groups: object-based, record-based and physical data models. The object-based data models are logical models used to represent data in an abstract way, that is close to the user's perception of the modelling environment. Examples of object-based models are the entity-relationship data model [45], semantic data models [38] and the object-oriented data model [9]. Recordbased models are used to represent the logical structure of the data, while the higher-level description of the storage structures is used for storing represented data. Record-based data models include relational, network and hierarchical data models. The physical data model 10

is used to specify the actual structures for storing data. In this work, we deal mainly with object-based data models. This section is intended to provide an overview of some features of recent object-based data models and their formalizations. First, the properties of the object-oriented database model are presented. Next, we give a brief description of data models which in uenced the design of the OVAL data model formalization presented in Chapter 3. In brief, the OVAL data model formalization is based on the formalizations of the IQL data model [3] and the O2 data model [48]. At the same time, the presented work can be seen as an integration of the F-Logic [41] view of the data model and the formalization, although we retain the sense of formalization used in [48] and [3]. Finally, some concepts of the OVAL data model are de ned in a manner similar to that presented by Vandenberg in the EXCESS data model formalization [82]. Unfortunately, much work which is not directly related to the OVAL data model formalization is not presented.

2.1.1 Object-oriented database model In object-oriented databases, objects are used to model any real or abstract entity. The object describes the static structure and the behavior of the modelled entity. In this section, we review some of the object-oriented database model properties considered mandatory by Atkinson et al. [9].

Object structure and identity The simplest objects are integer numbers, strings, characters, and the like. More complex objects can be built from simpler objects using type constructors. The most commonly used type constructors are: tuple, set and array. The use of type constructors should not be restricted by the data model. In other words, in the object-oriented model, the object can be structured by the arbitrary use of constructors. The structure of the object can be, for example, the set of sets, the set of tuples, etc. Every object has an identity which is unique in a given database. The identity of simple objects (e.g. integer numbers) is the same as their value. The identity of structured object is represented by the object identi er (oid), which can be treated as the pointer to the object.

11

Classes and types Objects with common properties are represented by classes. The class has a double role in the object-oriented data model [9]. Firstly, the class has the role of object factory. The object factory is used to create new objects by performing the operation new. Secondly, the class has the role of a warehouse. It serves as an abstraction of a set of its instances, which is usually called the class extension [15]. An object type is used to describe the properties of a set of objects. We distinguish between the static properties and the behavioral properties of objects. The static properties of the object represent the internal state of the object. The structure of the internal state of an object is described using the previously mentioned type constructors set, tuple and array. The behavioral properties of objects are modelled using the set of methods. The method is represented by a signature and an implementation. The signature of the method speci es the type of input parameters and the type of the result. The method is implemented in a programming language routine.

Inheritance The main idea of inheritance is to provide a mechanism for sharing the structural and behavioral properties de ned by existing database classes. The inheritance of properties is based on the inheritance hierarchy of classes, which is de ned using the isa or the subclass relationship. For example, the class student is the subclass of the class person (student isa person). The subclass inherits all properties of its superclasses. Single inheritance allows the class to have only one directly associated superclass, also called the parent class. Multiple inheritance allows the class to have more than one directly associated superclass. For example, the class student assistant can be de ned as a subclass of classes student and assistant. In this way, the class student assistant inherits the properties of both parent classes. In the case of single and multiple inheritance, an object can inherit more than one method or attribute with the same name. In this situation, the closest method (or attribute) in terms of the inheritance hierarchy of classes is selected when the method or the attribute in con ict is evaluated. This property of inheritance is called overriding. The problem can arise in the case of multiple inheritance. The multiple inheritance con ict arises when two or more methods are inherited from classes that are not related by the inheritance hierarchy. In this 12

case, either the system provides the rule by which it selects the correct method or combines both methods, or the user de nes explicitly which method to choose.

Polymorphism and late binding The method name can be used to denote dierent methods in dierent classes. Therefore, an action with a given name can execute in dierent ways, depending on the type of object. A typical example is the display operation and the hierarchy of geometric objects, each having its own display operation. This property is usually called polymorphism or method overloading. Suppose that we have a set of dierent geometric objects. In order to display all the geometric objects from the set, the display operation is applied to each element of the set. foreach x in s do x.display; The implementation of display can not be determined at compile time. The program must decide at run-time which method is evaluated for each particular element of the set. This delayed relating of method names to their implementations is called late binding. The method is dynamically bound to objects in contrast with statically bound methods, which are bound to the calling objects at compile time. The reusability of programs is enhanced using late binding. New types of geometric objects can be added later, while the procedure for displaying the set of objects can remain the same.

2.1.2 The O2 data model The O2 database management system is designed and implemented by the Altair DBMS group [27]. The formalization of the O2 object-oriented data model presented by Lecluse in [48] treats the set of interrelated objects as a directed graph. The graph's vertices represent basic objects, tuples or sets. Each vertex is labeled by an object identi er. For example, the object < i1; [a : i2; b : 2] > is de ned by the object identi er i1 and the value [a : i2; b : 2]. The tuple that represents the value of the object i1 includes two components labeled by attributes a and b. The corresponding object graph includes two nodes labeled by the object identi ers i1, i2 and a node which represents the basic object 2. The two arcs of the graph represent tuple attributes a and b. Obviously, the value of the tuple component can be an object identi er or a primitive object. The attribute value is not allowed to be a structured value that is not an object. Such attributes are usually called complex attributes [15]. 13

The O2 data model includes basic and the constructed types. Basic types include, for instance, integer or string. Types can be constructed using the type constructors set and tuple. The semantics of the type is de ned using its interpretation, which is the set of all instances of a given type. An informal de nition of the type interpretation is presented here. The interpretation of the basic type T is the set of object identi ers that refer to constants of the type T . The interpretation of the set-structured type fS g is the set of all subsets of the type S interpretation. The interpretation of the tuple-structured type T is the set of tuples which include at least the attributes de ned by the type T . The model of a type T for a given database D is the greatest interpretation of the type T in D. The model of the type [a : int], for example, subsumes the model of the type [a : int; b : int], which is also de ned in a given database. This property is used for the de nition of the partial ordering of types. In brief, the type T1 is a subtype of the type T2 if the model of the type T2 subsumes the model of the type T1. The behavior of objects is represented by methods. The type of the method is speci ed by a signature. The signature comprises the types of method parameters and the type of the method result. Again, the semantics of the signature is given by the model of the signature. The model of the signature is the set of all partial functions from the Cartesian product of parameter type models and the model of the method resulting type. The partial ordering of signatures is de ned using their models. The signature s1 is a subtype of the signature s2 if the model of s2 subsumes the model of s1. This de nition diers from the standard de nition of subtyping given by Cardelli in [18]. The presented subtyping rule is less restrictive but does not guarantee safe static type checking.

2.1.3 The IQL data model The Identity Query Language (IQL) is a logic based query language developed by Abiteboul and Kanellakis [3]. IQL integrates a rule-based logic programming language with some concepts of the object-oriented data model. Among others, the query language IQL demonstrates the advantages of using object identi ers as a database query language primitive. The IQL provides a rich structural data model. The basic constructs of the IQL data model are: o-values, classes, relations and types. The o-value is either a constant, an object identi er, a set of o-values or a tuple composed of o-values. The class represents an abstract concept which has type and interpretation. The class interpretation is de ned by oid assignment function , which assigns a set of oids to each class. The relations are de ned in 14

order to simplify the operational part of IQL [3]. IQL types can be structured by the use of type constructors tuple, union, intersection and set. Classes are treated as reference types that specify the type of object identi ers. The semantics of a type is presented by its interpretation, which includes all instances of a given type. The interpretation of a class is de ned by the oid assignment function. The interpretation of the set structured type fS g is the set of all subsets of the type S interpretation. The interpretation of the tuple structured type T is the set of all tuples whose components are elements of the interpretations of the type T component types. The interpretation of a type constructed as the intersection of types T1 and T2, denoted T1 ^ T2, is the intersection of T1 and T2 interpretations. Similarly, the interpretation of the type constructed by the union constructor is de ned using the operation union. The consequences of the integration of the inheritance hierarchy of classes in the structural model are presented in [3]. The extended interpretation of classes named inherited oid assignment is de ned instead of the previously de ned class interpretation. The inherited oid assignment of the class C is the set of object identi ers which includes the instances of the class C and the instances of C 's subclasses. The type interpretation is extended by replacing the ordinary class interpretation with the inherited oid assignment function. As a consequence of the inheritance hierarchy, the type of a class C can be de ned as the composition of: (i) the type which is assigned to the class C , and (ii) types that are assigned to the superclasses of C . The *-interpretation is introduced in order to be able to de ne the type of the intersection of two types as the intersection of the interpretations of these two types. In addition, the interpretation of a type of the class C can be de ned by using the interpretations of the types of C 's superclasses. The *-interpretation extends the ordinary interpretation of the tuple structured type. The *-interpretation of a type T includes at least the attributes de ned by the type T . The structure (i.e. type) of the attributes not de ned by type T can be left unspeci ed. Using this type interpretation, the interpretations of tuple structured types T1 and T2 can be composed into a single interpretation of type T = T1 ^ T2 by the use of the set intersection operation. The *-interpretation of the type of the class C , which includes the properties of the class C and also the properties inherited from its superclasses, can be now speci ed as the intersection of the *-interpretations of the types that pertain to the class C and all its superclasses1 . 1

A more detailed description of the *-interpretation and its use is given in [3].

15

2.1.4 F-Logic data model Frame Logic (F-Logic) integrates a logic programming language with the constructs of the object-oriented and the frame-based languages [41]. These features are complex structural objects, object identity, inheritance, methods, and encapsulation. Only some concepts of the F-Logic data model which in uenced the design of the OVAL data model are presented in this subsection. A more detailed description of the F-Logic data model part is far beyond the scope of this work. A complete description of the model and the language parts of F-Logic can be found in [43]. The data model of F-Logic provides the means for the de nition of complex objects. A database comprises individual objects usually called instances, and the class objects. Each object has an identi er and a value. The class objects are treated as abstract representations of the set of their instances, as well as templates for generating new objects. The complete set of database objects is partially ordered. Although there is no separation between the schema and data, there is still strict separation between attributes used for specifying types and attributes that represent concrete properties. The range of the attribute of the rst sort is an abstract value (e.g. class object). For example, the class object person has the property works for, the value of which is a class object department. An example of the use of the second type of attributes is the following. A person named Jim works in a department named e407. Still, both types of attributes can be used to describe the properties of a single object. From this perspective, the dierences between the schema and the instance levels of the database are blurred. Structural inheritance and behavioral inheritance are de ned with respect to the partially ordered set of object identi ers. The data model allows the de nition of two types of attributes depending on the inheritance property of attributes. The inheritable attributes are always inherited by subclasses, while the non-inheritable attributes describe properties that pertain to the given object and are not inherited by subclasses. The structural inheritance of properties is monotonic. As a consequence, in the event of multiple inheritance con ict, the attribute in con ict obtains the value that is the union of all inherited properties. In contrast, the behavioral inheritance of methods is non-monotonic, which means that adding a clause to the database can alter the derivation of another clause2 . The causes of the nonmonotonic behavior are the overriding principle and multiple inheritance. The inheritance con ict in the case of methods is resolved by choosing the method non-deterministically. 2

The de nition of non-monotonic inheritance is given in [43].

16

2.1.5 EXTRA data model The EXTRA data model [82] is used for modelling data structures manipulated by the EXCESS query language, which is implemented as part of the EXODUS database management system [19]. The type constructors de ned by the EXTRA data model are: the set, tuple and array. In addition, the EXTRA data model uses ref and own type constructors for specifying the semantics of attributes. The value of the attribute, which is speci ed by the own constructor, is a simple data structure that does not have identity. The ref type is used for de ning the attribute whose value is the reference to an object. Third, the own ref attribute is used for the speci cation of the reference to the object that is owned by the referencing object. This information can be used for specifying the semantics of the delete operation. The combination of set and array with ref, own and own ref type constructors provides a powerful tool for modelling complex objects [82]. The EXTRA data model is formalized using graphs. The nodes of a graph represent type constructors. The edges of the graph represent the component-of relationship. The type constructors which are used for labeling the nodes are: set, tup, arr, ref and val. The val type constructor indicates a simple scalar value with no associated structure. The graph stands for the representation of the structural type. The semantics of the type (or graph) is given by the speci cation of its domain, which corresponds to the notion of the type interpretation. The domain of the type is de ned by describing all possible instances of this type, where the structure of instances must follow the structure of a given type. This ordinary domain is further extended by taking into account the type hierarchy. In this way, the extended domain of a type includes instances of this type and instances of its subtypes. The extension of the reference type domains is treated separately. The domain of the reference type includes all oids that refer to the instances of this type and oids that refer to the instances of its subtypes. The semantic of the reference type domains is de ned by a set of rules which specify the relationships among the domains of existing reference types.

2.2 Database algebras A database algebra consists of a set of operations intended for the manipulation of database entities which are de ned in accordance with the particular data model. Algebraic operations can be combined into expressions that form a functional language [33, 5]. These expressions serve as the higher-level and symbolic representation of the user intent, which can be data 17

retrieval or data modi cation. The algebraic expressions can be manipulated by a set of transformation rules intended to produce logically equivalent expressions, which can be more eciently evaluated on a given database. Database algebras therefore also serve for the optimization of user queries. In this section we rst overview two well-known algebras that in uenced the design of the OVAL object algebra. These are the relational algebra and the nested relational algebra. Next, three database algebras which have a strong in uence on this area, and which are closely related to the OVAL algebra, are presented. The query algebra for objects [70] is presented rst. The query algebra is used as the basis for some more recent work on algebras and object-oriented query optimizers [54, 50]. As in our approach, the query algebra comprises relational operations and the set of functional operations. The Complex Object Algebra [5] is presented in the Subsection 2.2.4. The main contribution of the algebra is its operation replace, which is primarily intended for restructuring complex objects. Finally, the Logical Data Model (LDM) algebra is brie y described. The LDM is the generalization of the relational, hierarchical and network data models.

2.2.1 Relational algebra The relational database model is de ned by Codd [21]. It is based on the mathematical concept of a relation. The relation is a subset of the Cartesian product of the list of domains. It can be represented by a table, where each row is a n-tuple that represents the relationship among the set of values. Each column of the table represents an attribute. The domain of the attribute is the set of permitted attribute values. In the relational database model, the domain of the attribute is restricted to simple values (e.g. integer numbers). The fundamental operations of the relational algebra are the following [45]. The operation select lters the relation by the use of the selection predicate. The selection predicate is an expression composed of attribute names and constants, which are related by the comparison operations ; and =, and the boolean operations and, or and not. The result of the application of the operation select on the given relation is the relation which is subsumed by the original relation. The set operations union and difference are used to perform the union and the dierence of two relations. The operation project is used to select the desired columns of the argument relationship. The result of the operation project is a relation which contains only those attributes speci ed by the operation project. Finally, the operation Cartesian product is de ned to perform the Cartesian product of two relations. 18

The set of attributes of the resulting relation is the union of attributes of both argument relations. The additional operations of the relational algebra are the operations intersection and join. The operation intersection can be expressed by the operation difference. The operation join can be simulated by the operations Cartesian product and select. Given two argument relations, the join operation rst performs the Cartesian product of relations and then lters the resulting relation with respect to the given selection expression. The natural join operation is a special case of the ordinary join operation, where the selection condition is de ned implicitly. The values of identically named attributes are compared in order to lter the relation created by performing the Cartesian product operation on the input argument relations.

2.2.2 Nested relational algebra The nested relational model extends the relational database model by removing the restriction that the attribute value may only be a simple constant. The values of attributes are allowed to be relations. In this way, relations can comprise the primitive attributes and the relation-valued attributes. The algebras for nested relations have been studied by Roth and Korth [62], Abiteboul [1], Colby [22] and others (e.g. [36, 82]). An overview of the nested relational data model and algebra is presented in [68]. In the following paragraphs, the operations of nested relational algebra proposed by Roth and Korth in [62] are brie y overviewed. Most operations of the nested relational algebra are de ned as extensions of the relational algebra operations. Exceptions are the operations nest and unnest, which are used for restructuring nested relations and are new to the relational model. Let us start with a description of the nested relational operation union. The union of relations r and q is the relation that includes the n-tuples from both relations. Where two or more n-tuples of the argument relations have the same values of primitive attributes, the union of these tuples is obtained by computing the union of all relation-valued attributes. Other n-tuples from both operand relations are simply moved to the resultant relation. In other words, the union of two relations is the smallest relation that contains both of the argument relations. The nested relational operations difference and intersection are de ned similarly to the de nition of the operation union. The nested relational operations select and Cartesian product remain as in relational algebra. The semantics of the operation project is very 19

similar to the relational projection. The relationship is rst projected to the speci ed set of attributes. The union operation is performed on the resulting set of tuples to obtain the nal result of the projection. The only dierence between the nested relational natural join operation and the at relational natural join operation lies in treating the relationvalued attributes. The argument relations are joined with respect to the values of equally named primitive attributes at the at level of nested relation. The values of equally named relation-valued attributes are then merged using the intersection operation. Finally, the operations nest and unnest are used for restructuring relations. The operation nest restructures the argument relation by nesting the speci ed set of attributes. In this way, the resulting relation has a new relation-valued attribute containing the values of speci ed attributes. The eect of the unnest operation is the opposite of the eect of the nest operation. The operation unnests the nested relation speci ed by the relation-valued attribute. In other words, the operation unnest(A) inserts the tuples of the nested relation denoted by the relation-valued attribute A into its parent relation.

2.2.3 Encore/Equal algebra A query algebra for object-oriented databases was developed by Shaw and Zdonik [70]. The data model includes abstract data types, type inheritance, typed collections of objects and objects with identity. Each type is described by the type name, the set of properties, supertypes and operations. The properties are typed objects implemented either as stored values or as procedures. In addition to user-de ned abstract types, the set of prede ned atomic types (e.g. string), and parametrized types Set[T] and Tuple[T] are de ned for description of the object static structure. Objects can be compared using dierent kinds of equality operations, which are de ned in terms of i-equality. The i-equality is de ned to compare the states of the argument objects up to the speci ed nesting level. The algebra is de ned on collections of objects with identity. Most operations return collections of existing database objects, while some create new objects. The algebra includes the standard relational operations selection, union, difference and intersection. The object join operation ojoin creates pairs of objects that are ltered using the parameter predicate. The operations image and projection are used for the application of one or more functions (queries) to the objects from the queried collection, yielding a collection of objects or tuples respectively. The function applied to each object from the argument collection can be an arbitrary composition of operations. The flatten operation restructures the set 20

of sets into a at set. The nest and unnest operations are de ned similarly to the NF 2 algebra restructuring operations nest and unnest. Finally, the DupEliminate and Coalesce operations are de ned to eliminate duplicate objects from the collections. The operation DupEliminate eliminates duplicates from the input set of objects with respect to the type of equality, which is speci ed as the operation parameter. Similarly, the operation Coalesce eliminates the duplicates with respect to the i-equality of the speci ed object component.

2.2.4 Complex Object Algebra The Complex Object Algebra (COA) was proposed by Abiteboul and Beeri in [5]. Calculusbased and algebraic languages for complex objects are presented in their work. The equivalence is shown between these two languages. The data model of the Complex Object Algebra [5] is an extension of the NF 2 data model. Unlike the NF 2 data model, there is no restriction on the use of the tuple and set type constructors for the de nition of the object structure. The model does not provide the concept of object identity and the constructs for modelling behavior, while it includes most features of the structural part of the object-oriented database model. The operations of COA can be composed into queries by means of the function composition operator and by using higher-order operations of the algebra. Therefore, the algebra is a functional language. The algebra includes the relational operations union, intersection and difference. The variation of the Cartesian product operation is de ned. It can perform the Cartesian product of sets which include arbitrarily structured values. The powerset operation computes all subsets of a given argument set. The operation makes the algebra equivalent to calculus. The set-collapse operation results in the union of the argument set of sets. Next, the standard selection operation is used to lter the set of values. The operation replace is introduced to provide the facilities to restructure values and to allow us to apply queries to the nested structures. The operation replace realizes the application of the parameter query on the set of structured values. The parameter query of the replace operation is speci ed by a functional notation called replace speci cations. This notation subsumes all de nable queries of the algebra. In addition, the notation allows the queries to be structured by the arbitrary use of the set and tuple constructors. In this way, the replace parameter can form a "structured query" which, when applied to the set of values, can be used to construct complex data structures. Nesting of replace operations 21

can serve as a tool for accessing data structures that are arbitrarily nested in the argument data structure.

2.2.5 LDM algebra The Logical Data Model (LDM) was developed by Kuper and Vardi [46, 47]. The main intention of the work was to generalize the relational, hierarchical and network models. The logic and algebra based languages are de ned. The LDM database schema is a directed graph, where nodes represent types and arcs represent the properties of types. The graph nodes are one of the following types: basic, collection, composition or union. Instances of the nodes are objects that are composed of object identi ers, called surrogates, and values. The sets of object identi ers that refer to the instances of dierent nodes do not overlap. Ordering among the graph nodes is de ned with respect to the arcs that connect nodes in the graph. This ordering is used as the basis for the evaluation of logic expressions. Each operation of the LDM algebra creates a new node and the corresponding set of node instances. The node instances created by an operation are either newly generated objects or copies of existing database objects. The copies have new object identi ers. The operations can be classi ed into three groups. The rst group of operations includes operations that copy and combine existing nodes. The operation copy creates a new node of the same type, containing copied values with newly created identi ers. The second operation creates a new node with the parameter constant as the single instance. The operation powerset creates all subsets of the given set object. The creation of the union node is an operation that can be characterized as the operation performing the generaliziation of the given set of nodes. Given a set of parameter nodes, the operation creates a new node the instances of which include references to the instances of the parameter nodes. The second group includes selection operations. The ordinary selection operation is de ned similarly to the relational selection. The operation selects those tuples whose speci ed components are related by the given relationship. The containment selection can operate on every node of the scheme that includes the child node. The result of the selection is a new node the instances of which refer to the copies of instances of the child node. The selection condition tests if the child node instance is a component of one of the parent node instances. 22

The last group of LDM operations include the operations: Cartesian product, union, difference and projection. These operations are again de ned very similarly to the corresponding relational algebra operations.

2.3 Functional database query languages The functional query language FQL [16, 17] and its successors the languages GDL [12] and O2FDL [53] had a strong in uence on the design of the OVAL query language. The main characteristic of the FQL query language is its small set of simple functional operations which can be combined using the function composition operation. Consequently, the FQL allows the incremental construction of more complex queries from simple queries. The OVAL query language can be seen as a generalization of the FQL for the manipulation of objects. Another database language which also belongs to the family of functional database languages is FAD [25]. FAD is a computationally complete database programming language based on a set of powerful functional operations intended for querying the database. It has been used as a programming language for a parallel database system. The following paragraphs provide an overview of the main features of FQL, GDL, O2FDL and FAD.

2.3.1 FQL The Functional Query Language (FQL) [16, 17] is based on ideas concerning the functional programming language FP proposed by Backus [10]. The database structure is represented by the functional database model. The object static properties are modelled by functions that map from instances of the given type to instances of the type representing the property. The inverse of the FQL function can be obtained by placing the operator "!" in front of the function. However, the existence of the inverse function depends on the particular database. The FQL query is a function which can be combined into a more complex query using the function composition operation. In this way, queries can be developed incrementally from more simple queries. FQL queries are de ned on streams of objects, where the stream of objects denotes a "virtual" sequence of objects whose physical representation should be of no concern to the programmer. The main building blocks of the FQL query language are its prede ned stream manipulating functions: extension, restriction and construction. The extension function realizes the well-known maplist function [33]. The parameter function is applied to each element of input stream, producing a stream of results of the parameter func23

tion evaluation. The restriction function is used for ltering a stream of objects, producing a new stream of objects as the result of the function evaluation. Stream objects are ltered with respect to the result of the restriction parameter boolean function. The construction function applies a set of parameter functions to each element of the input stream, yielding a stream of tuples. The FQL allows the use of arithmetic operations. These are de ned on streams of tuples which are composed of two components. Next, the function =+ is used to sum the elements of the input stream, while the function len counts the number of stream elements. Finally, the use of recursion in FQL provides a simple tool for querying recursive data structures.

2.3.2 GDL and O2FDL The GDL [12] functional query language evolved from the query languages FQL [16] and DAPLEX [71]. The main contributions of GDL are: (i) the proposal of the implementation technique for a functional query language, and (ii) adding facilities for querying nested relations. The GDL queries are de ned on streams of tokens, which are used for the representation of the stream of objects together with their structure. The language is based on the set of productions, which are used as the stream rewriting rules. Productions realize the FQL functions extension, restriction and construction, and the operations group and normalize. The FQL operation for ltering the set of objects is extended to allow the use of existential quanti cation in the ltering expression. The group operation nests the elements of the argument set by one level. The normalize function replaces the stream of nested relations with the stream of the corresponding relations in the rst normal form. All GDL functions are applied to the innermost sequences in the case of nested sequences. The syntax of the language has been modi ed in comparison with the FQL in order to serve as a stand-alone query language. The object-oriented functional data language (O2FDL) [53] further extends the functionality of the GDL to be able to manipulate objects. The data model integrates the features of the functional model with the object-oriented data model. The static properties of objects are represented using single-valued and set-valued functions. The inheritance hierarchy of classes can be de ned. The O2FDL data language is intended to serve as the formal basis for the database programming language implementations. The semantics of the language is de ned using extended lambda calculus and a version of denotational semantics for lambda calculus. In contrast to FQL and GDL, the O2FDL is a computationally complete functional programming language which includes facilities for the de nition of persistent objects and 24

for querying a database.

2.3.3 FAD FAD [25] is a general purpose database programming language. The FAD data model is based on objects; it does not include the concept of object identity as a separate modelling construct. Object identi ers are treated as an implementation issue. Objects are de ned as pairs comprising object identi ers and values. The object value is a data structure which can be constructed using the tuple and set type constructors. The FAD language is based on a simple grammar whose main construct is an action. The action is used to indicate a computation that returns data after possibly accessing or updating data. Only a brief description of the set of FAD actions is given here. Other aspects of FAD and precise de nitions of FAD actions can be found in [25]. The actions can be classi ed into basic and constructed actions. The basic actions may realize the construction of simple or structured values, object component selection, object creation and modi cation, etc. The set of constructed actions is the core of the FAD data language. The set includes the operations: let, if , while, filter, pump and group. The let statement is used to de ne variables. The statement includes one or more variable assignments. It is concluded by the nal action, which de nes the result returned by the let action. The if statement realizes ordinary conditional execution. The while ? do statement is actually a higher-order function with three parameters de ning the initial state, loop function and exit function. The loop function is executed until the result returned by the loop function is null. The exit function is executed before the action termination. The function filter applies an argument function to the Cartesian product of the sets, which are speci ed as the second argument of the function filter. The result of the function filter is a set of objects, which are the result of applying the filter parameter function to the elements of the parameter set of objects. The pump function has four parameters: unary operation, binary operation, identity object and a set of objects. The unary operation (i.e. function) is applied to each element of the parameter set. The resulting set is then reduced by the use of binary operation. The identity object serves as the initial argument of binary operation application. Finally, the group function creates the set of equivalence classes of the argument set. The equivalence classes are created according to the results of the argument function application. 25

Chapter 3 Data Model Formalization 3.1 Introduction One of the most important design decisions in the formalization of the object-oriented data model is the treatment of the so-called dichotomy between objects and values [14, 3] in the model. First, the model can be based on an object abstraction, whereas values are not supported by the model. In this case, each object value is composed solely of other objects. This approach is used in database languages based on logic [41] and in the functional database programming language FAD [25]. The second approach uses objects and structured values as the basic building blocks of the data model, whereby the components of the object value are allowed to be objects as well as values. This approach has usually been used in the procedural database programming languages based on the C or C++ programming languages [23, 49, 7]. The OVAL data model uses the latter approach for the following reasons: we intend to develop a data model which can be integrated with the type system of database programming languages based on C++; further, we consider that from a purely technical point of view, the rst approach can be seen as a restriction of the second one. We claim that the main obstacle to a more uniform formalization of the object-oriented data model is the unnecessarily strict distinction between the schema and the instance levels of the database. The formalization we present integrates them by treating classes as abstract objects in a similar way as frames [28] are used to represent abstract concepts. The class is described by an identi er and a value, like any other object. The main dierence between the class objects and the individual objects is in their interpretation. The interpretation of an individual object is an object itself, while the interpretation of a class is a set of more 26

speci c objects. Such a de nition of classes provides a uni ed view of a database, which is from this perspective a collection of objects, including individual and class objects. In accordance with this view, the static part of the standard notion of a type [18] is de ned as the structured value composed of class objects and primitive types. In this way, close integration between classes and types is established. The type interpretation, for instance, can be de ned as a straightforward extension of the class interpretation. Similarly, the presented view of a database leads to the de nition of two kinds of orderings of database elements. First, the set of classes is partially ordered by the subclass relationship. This partial ordering is further extended to include the individual objects. Secondly, the subtype partial ordering relationship is de ned for types and further generalized to all structured values. The rest of this chapter is organized as follows. The structural part of the data model is presented in Section 3.2. The behavioral part of the data model is presented in Section 3.3. The nal section includes some concluding remarks and some areas for further study.

3.2 Structural model In this Section, the structural part of the OVAL data model is presented. As was pointed out in the introduction, the salient feature of the OVAL data model is its treatment of the schema information. The class is treated as abstract object which describes the common properties of a set of its instances. The uniform treatment of database objects provides the basis for simple de nitions of the relationships among data model concepts. Furthermore, the uniform treatment of database elements allows the de nition of a set of simple operations for the manipulation of the individual objects as well as classes. In this way, a tight interrelationship between the data model formalization and the algebra is established. This section is organized as follows. Firstly, the basic building blocks of the OVAL data model are de ned in terms of o-values and objects in Subsection 3.2.1. The concept of a class object and its relationships to the individual objects are de ned in Subsection 3.2.2. Next, the partial ordering relationship is de ned on the set of database objects. In the Subsection 3.2.3, we introduce the notion of a type and further present the relationships between types and classes. The previously de ned partial ordering of objects is extended to o-values.

27

3.2.1 Object and o-value Let us rst de ne the basic terminology. Firstly, we assume the existence of a prede ned in nite set of object identi ers O. The object identi er1 is the unique identi cation of an object in a given database. The set of object identi ers subsumes an in nite set of constants denoted by D. A constant can be, for example, an integer number or a string. Next, the existence of an in nite set of values V is assumed. This set includes the set of oids O and the set of structured values. We distinguish between two sorts of structured values: sets and tuples. The set is denoted by the expression fa1; a2; : : :g, where a1; a2; : : : are the elements of the set. The tuple is denoted by the expression [A1 : a1; : : :; An : an], where the pairs Ai : ai (i 2 [1::n]) represent the tuple components. Each tuple component comprises the component name Ai, also called attribute, and the component value ai. We assume the existence of a set of attribute names A = fA1; A2; : : :g. Since the values are primarily used to represent the states of the objects, we call them o-values [3]. The o-value is formally de ned as follows:

De nition 1 (o-value, Abiteboul [3]) The o-value v, v 2 V , can be one of the following:

object identi er: v = id; id 2 O, set value: v = fv1; : : : ; vng, where vi 2 V or tuple value: v = [A1 : vi; : : : ; An : vn], where vi 2 V and Ai 2 A. For example, the o-value [age : 50; kids : fi1; i3g; lives at : "Brisbane"; work as : "teacher"] represents the value of an instance of the class person. The value fi1; i3g denotes the set of object identi ers. Note that there are no constraints on the structure of the value in terms of type so far stated by the De nition 1. For instance, the set can contain o-values with a heterogeneous structure. There are two aspects of objects. First, every object has an identity which is realized by the object identi er. The object identi er distinguishes the object from all other objects in the database. Second, every object has its value which represents its state. The two basic object aspects are connected by means of a valuation function , which maps each oid to the corresponding value. The notion of the object is de ned as follows. 1

The term object identi er is often shortened to oid.

28

De nition 2 (Object) The object is the pair o = (id; v), such that id 2 O and v 2 V . We distinguish between primitive and user-de ned objects. The object identi er of primitive object is the same as its value. For instance, the primitive object (1; 1) represents the integer number 1. The value of a user-de ned object can be the set, the tuple or the object identi er. The valuation function can be now de ned as the mapping between the object identi ers and o-values.

De nition 3 (Valuation function) The valuation function maps each oid id, id 2 O, to its value v, v 2 V ; : O ! V .

3.2.2 Classes and objects The class is an object that acts as an abstract representation of a set of objects that share similar structure and behavior. The elements of this set are called instances of the class. Like other objects, the class is described by an object identi er and a set of properties that represent its value. The object identi er of the class is in the following text represented by a label. For example, the object identi er of the class person is represented by the label person. The set of all object identi ers that refer to class objects in a given database is denoted by OC . Accordingly, the set of ordinary data objects, or individual objects, is denoted by OD . Therefore, the set of object identi ers from a given database is O = OD [ OC . The value of the class is an o-value whose components are object identi ers that refer to class objects. For example, the value of the class person is the tuple [name : string; age : int; friends : fpersong; lives at : address]. The labels int, string, address and person denote object identi ers that refer to class objects. The classes int and string are called primitive classes. As for constants, the identity of the primitive class equals its value. For instance, the class int is represented by an object (int; int), where the label int denotes the object identi er, which represents also the value of class object. The most signi cant dierence between the class objects and the individual objects is in their interpretation. While the interpretation of the instance object is the object itself, the interpretation of the class object is a set of objects, i.e. the instances of the class object. The class interpretation is de ned as follows:

29

De nition 4 (Class interpretation, Abiteboul [3]) Let C , C 2 OC , be the class object. The interpretation of C is (C ) OD , such that (C ) \ (Cj ) = ; for all Cj 2 OC and Cj 6= C . As can be seen from the de nition, we used common engineering intuition, as stated by Abiteboul in [3], by treating the object as the instance of a single class. The consequence of this design decision are disjunctive sets of object identi ers that represent the interpretation of classes2 . Therefore, an object identi er that represents the individual object is an element of exactly one class interpretation. The class interpretation can be used to formally specify the instantiation relationship among class and data objects: x instance of y, where x 2 OD and y 2 OC , if x 2 (y).

Partially ordered set of objects For the purpose of sharing the structure and the behavior of objects, the inheritance hierarchy of classes is de ned. It is de ned by means of the binary relationship among class objects called isa. The expression x isa y says that the class x is a subclass of the class y. We assume that the subclass hierarchy is de ned by the user when specifying the database conceptual schema. The isa relationship organizes classes into a partially ordered set. We extend this partially ordered set of classes to all objects in the database by the use of the instantiation relationship de ned in the previous section. Furthermore, this partial ordering is later extended to all o-values that exist in the database.

De nition 5 (isa poset) The isa poset is the pair (O; i). Let id1; id2 2 O, then id1 i id2,

if one of the following holds: 1. id1 = id2,

2. id1 ; id2 2 OC ^ 9idc : idc 2 OC ^ id1 isa idc ^ idc i id2 , or 3. id1 2 OD ^ id2 2 OC ^ 9idc : idc 2 OC ^ id1 2 (idc ) ^ idc i id2.

We call the relationship i a more speci c relationship. An example of isa poset is de ned by the following terms: student i person, employee i person, instructor i person, 2

The class interpretation is sometimes called the class extent [15, 9].

30

student assist i student, student assist i instructor, jim i instructor, jane i student and john i ta. Note that labels denote oids which stand for the class and data objects. The ordinary class interpretation maps the class to the set of oids representing its instances. By taking into account the previously de ned partial ordering of classes and data objects, another class interpretation is de ned. The inherited interpretation [3] of the class C includes the instances of the class C and the instances of the class C subclasses.

De nition 6 (Inherited class interpretation, Abiteboul [3]) Let C be the class. The inherited interpretation of C is (C ) =

[

Cj i C

^

Cj 2OC

(Cj ):

3.2.3 Types and classes The static structure and behavior of objects are represented by a type. Formally, the type is the pair (S ; P ), where the component S represents the static structure of objects, denoted as structural type, and P describes their behavior. We ignore the representation of object behavior in this section and present only the static object properties. The behavior of objects is presented in Section 3.3. The structural type is the o-value composed of more simple types3. Analogous to our perception of classes, structural types stand for the abstract representation of the set of more speci c o-values. The set of all types from a given database is denoted by VT , where VT V .

De nition 7 (Structural type) The o-value T , T 2 VT , is a structural type if it is one of the following.

the reference type: T 2 OC , the set type: T = fS g, where S 2 VT , the tuple type: T = [A1 : T1; : : : ; An : Tn], where Ti 2 VT and Ai 2 A. 3

In the remainder of this section we will use the term type in place of structural type.

31

By this de nition, the set of all structural types also includes the set of classes, which now have the role of reference types. For instance, the type of employee is T = [name : string; age : int; works : organization; lives at : address]. The type of the attribute age is the object identi er that refers to the primitive class int. The object identi er that refers to primitive class has also the role of primitive type. The instances of primitive types are constants which are the elements of D. The primitive type is treated as an "abstract constant". The class organization from the above example plays the role of the reference type. The label organization stands for the object identi er that refers to the class organization. The instances of the object identi er organization are the object identi ers that refer to the individual objects representing organizations. Finally, note that the type T from the above example is an o-value composed of primitive types and object identi ers that refer to userde ned classes.

The type of the class The properties of a class object are represented by its type. Therefore, the type of the class corresponds to its value. Due to the subclass hierarchy and the use of the inheritance principle, the type that corresponds to the class consists of: (i) properties that are directly associated with the class, and (ii) inherited properties. Consequently, two assignment functions are de ned. Given the class C the type assignment function returns properties of the class C , which are directly associated to the class C . Next, the inherited type assignment function returns the complete type of the given class C . For example, the type assignment function applied to the class student is de ned as (student) = [degree : int; courses : fcourseg]. The result of the application of the inherited type assignment function to the same class is (student) = [name : string; age : int; degree : int; courses : fcourseg]. The latter type represents the value of the class student. It includes properties pertaining to classes student and person. The type that is directly associated with the class is speci ed by the user when designing the conceptual schema of the database. The type assignment function then simply returns the previously de ned o-value for the particular class. The complete type of the given class can be obtained by the use of the isa poset, whereby a class inherits all the properties of classes that precede in the isa poset. The inherited type assignment is formally speci ed as follows: 32

De nition 8 (Inherited type assignment) Let C be the class, then the inherited type assignment is de ned as (C ) = [A1 : T1; : : : ; Ak : Tk]; Ai 2 A[, where A[ =

[

C i Pj

Attr(Pj )

The result of function application are all the properties that describe the given class object. The inherited type assignment function realizes the valuation function for oids that represent classes. The previously given de nition of inherited type assignment describes the structural inheritance of class properties. It is called "structural" since the structure of classes is propagated through the inheritance hierarchy of classes.

Partially ordered set of o-values The partial ordering of objects, previously denoted as isa poset, can be extended to o-values. This partial ordering includes all entities stored in a database including types and classes. In the following de nition, we extend the previously de ned relationship more speci c denoted by i. Intuitively, the o-values that are "below" in the poset are more speci c than (or re ne) the o-values that are "higher" in the ordering. The poset is de ned in the following way:

De nition 9 (o-value poset) The o-value poset is a pair (V ; o). Let be v1; v2 2 V ovalues. The o-value v1 is more speci c then the o-value v2, denoted by v1 o v2, if one of

the following holds:

1. v1 2 O ^ v2 2 O =) v1 i v2, 2. v1 = fs1 ; : : :; sk g ^ v2 = ft1; : : :; tk g =) 8ti(ti 2 v2 ^ 9Sj (Sj v1 ^ 8sl(sl 2 Sj ^ sl o ti)) ^ 8su(su 2 (v1 ? [1jp Sj ) ^ 6 9tv (tv 2 v2 ^ su o tv ))), or 3. v1 = [A1 : a1; : : :; An : ak ] ^ v2 = [B1 : b1 ; : : :; Bk : bl] =) 8bi(bi 2 C (v2) ^ 9Sj (Sj C (v1) ^ 8al(al 2 Sj ^ Al = Bi ^ ai o bl)) ^ 8au(au 2 (C (v1) ? [1jpSj ) ^ 6 9bv (bv 2 C (v2) ^ au o bv ))), where C (v) denotes a set of components of tuple v.

33

The previous de nition of the o-value poset subsumes the partial ordering, which is usually de ned using the subtype relationship [18]. The latter can be obtained by restricting the set of all o-values V to types VT in the above de nition. Similarly, the o-value poset also describes the instantiation relationship between types and values: instances are more speci c than their types. Furthermore, the o-value poset can be seen as composed of the following two partially ordered sets. The isa poset is the ordering of object identi ers. Secondly, the values of object identi ers form the partially ordered set of structured values. If there exists a subclass or an instantiation relationship between two oids, then there exists a subtype or instantiation relationship between the corresponding two values of these object identi ers. This can be expressed by the following Corollary. Note that id1 and id2 can refer to the class or to the data objects.

Corollary 1 Let id1; id2 2 O such that id1 i id2, then (id1) o (id2) holds. Proof. We rst prove the case when id1 and id2 refer to classes. The values (id1 ) and (id2) represent a static type of classes id1 and id2, which can be obtained by a function . Since id1 i id2 and (c) for a given class c returns a union of properties that pertain to class c and its superclasses, we conclude that (id1) o (id2). Where id1 is an instance object, we rst map it to its parent class c1. Now we can prove that (c1) o (id2) in a similar manner as in the rst case. We can easily see that (id1) o (id2), since (id1) o (c1) and (c1) o (id2). 2

Structural inheritance The inherited type assignment function (De nition 8) can return a type which includes two attributes with the same name. Therefore, a problem arises when we would like to access the value of such an attribute. There are two types of con icts. In the rst case, inherited attributes with the same name are de ned for classes that are related by the subclass relationship. In the second case, the cause of the con ict is multiple inheritance. In this situation, the class inherits two or more attributes with the same name from its superclasses, which are not related by the subclass relationship. The rst type of con ict is resolved using the overriding principle. In the case of name con ict, the attribute which is closest with respect to the isa poset is chosen. Hence the attribute that is de ned for a given class overrides attributes with the same name de ned 34

for more general classes. For example, attribute surname of the class married woman overrides the attribute surname de ned for persons. Still, according to the De nition 8, both attributes are de ned for the particular description of a married woman. The overridden attribute can be accessed by explicitly stating the class of its de nition. An additional property of the overriding principle is required in the OVAL data model. The type of the attribute A, which overrides the attribute A0, must be more speci c than the type of the attribute A0. This property is speci ed by the following de nition.

De nition 10 (attribute specialization) Let C1 and C2 be classes, A 2 Attr(C1) and A 2

Attr(C2), then the following constraint must hold:

if C1 i C2 then (C1):A o (C2 ):A

Note that the valuation function is used to obtain the value of the class. The dot operator is used to select the type of the attribute A from the class value. The property expressed by the above de nition is necessary for the correct operation of the type checking algorithm. The second type of con ict which can arise in the case of multiple inheritance is resolved in the same manner as the overridden attributes are accessed. The user must state the class where the attribute is de ned explicitly, if a name con ict arises,

Type interpretation The close integration of the class and the type concepts provides a clearer interpretation of types, which can be now de ned as a straightforward extension of the class interpretation. The type interpretation is de ned as follows. Note that in the following de nition, c denotes the inherited interpretation of the class.

De nition 11 (Type interpretation) Let T 2 VT . With respect to the type T structure, its interpretation (T ) is:

T 2 OC , (T ) = c (T ), T = fS g, (T ) = fo; o (S )g or T = [A1 : T1; : : :; An : Tn], (T ) = f[A1 : t1; : : :; An : tn ]; ti 2 (Ti)g. 35

This de nition of the type interpretation presents the instantiation relationship. The instances of the type T are the elements of the type T interpretation. The de nition describes instances of classes as well as instances of structured types, i.e. object identi ers and structured values. The substitutability principle [54] allows the variable, the tuple component or the set element of type T to be an instance of type T or an instance of type T subtype. The above de nition of the type interpretation allows the use of the substitutability principle for object identi ers, but it can not be used for structured values. The substitutability principle for the structured o-values is provided by the inherited interpretation of the type.

De nition 12 (Inherited type interpretation) Let T 2 VT . The inherited interpretation of the type T is (T ) =

[

Tj o T

^

Tj 2VT

(Tj ):

The inherited interpretation of the type T includes the instances of the type T and the instances of the type T subtypes. For example, the inherited interpretation of the type [name : string; works for : organization] is the union of ordinary type interpretations: ([name : string; works for : organization]) = ([name : string; works for : organization]) [ ([name : string; works for : institute]) [ : : : [ ([name : string; age : int; works for : organization]) [ : : :.

3.3 Modelling behavior The behavior of instances which belong to the class c is speci ed by the set of methods. Each method is de ned by the signature and the implementation of the method; an algorithm that computes an object from the set of parameter objects. Each method is a function which returns a value. The signature speci es the name of the method and the structure of objects that take part in the method evaluation.

De nition 13 (signature) The signature of a method m de ned on a class c0 is an expression m : c0 c1 : : : ck ! c, where m is the signature name and ci (1 i k) and c denote types.

36

The signature m can be treated as an interface of the method which can be applied to the instances of the class c0. The signature de nes the directed relationship among the instances of types which take part in the signature de nition. Therefore, the signature interpretation is the set of all partial functions from the Cartesian product of the parameter type interpretations, to the interpretation of the type of the method result.

De nition 14 (signature interpretation, Lecluse [48]) The interpretation of the signature s = m : c0 c1 : : : ck ! c, denoted by (s), is the set of all partial functions from (c0) : : : (ck ) to (c).

3.3.1 Signature properties In the following paragraphs, we present some properties of the signatures which derive from the previously given de nition of the signature interpretation. We say that the signature s1 is valid for the method m described by the signature s, if (s1) (s). First, the input types of the signature can be replaced by more speci c types. This primarily means that the signature c0 : : : ck ! c is inherited to the subclasses of c0.

Corollary 2 (inheritance) Let s = m : c0 : : : ck ! c be a signature of the method m de ned by the class c0 and let be ca i c0 , then s1 = m : ca : : : ck ! c is a valid signature for the method m.

Proof. The corollary is correct since the de nition of the signature interpretation states that the method m can be applied to the elements of the inherited interpretation of c0. This set includes also interpretations of all classes that are more speci c than c0 (e.g. ca). Hence, (s1) (s) and the signature s1 is valid. 2 The objects that are used as the parameters of the method can be instances of any type that is more speci c than the type speci ed by the signature.

Corollary 3 (input-type restriction) Let m : c0 c1 : : : ck ! c be the signature of the method m de ned by the class c0 , then the signatures m : c0 c01 : : : c0k ! c, where c0j o cj ; j 2 [1::k], are valid signatures for the method m. This property can also be simply proven from the de nition of the signature interpretation. The property is stated as separate invariant in [18] and is de ned as the input-type restriction in [41]. Similarly, the output type of the signature can be also restricted. 37

Corollary 4 (output-type restriction) Let m : c0 : : : ck ! c be a signature of the method m de ned on a class c0, then the signature m : c0 : : : ck ! c0, where c0 o c, is also a valid signature of the method m.

This property can again be easily derived from the de nition of the signature interpretation. The intuitive reasons for the stated property are as follows: the method with the output type c manipulates properties of c, which are also de ned by any of its subtypes, while they need not be de ned for instances of supertypes of c. Similarly, if c is a class, then the method m can manipulate the properties of instances of any class c subclasses, since they have at least the properties de ned for the class c.

3.3.2 Signature poset The partial ordering of signatures can be de ned in a similar way to that in which the partial ordering of o-values is de ned by the relationship more speci c. The partial ordering of signatures is de ned using the relationships among signature interpretations.

De nition 15 (partial order, Lecluse [48]) Let m1 and m2 be signatures. The signature m1 is more specific than the signature m2 or m1 o m2 i (m1) (m2). This de nition says that we can use a method described by the signature m2 in any place where the method with the signature m1 is used. This interpretation is meaningful, since the interpretation of m2 subsumes the interpretation of m1. In other words, the properties of the input and output parameter types of m2 are more general than the parameter types of m1. This means that parameter objects on which the method m1 can be applied contain all necessary components, so that m2 can also be applied to them. The de nition of the partial ordering of signatures can be restated through the following Theorem.

Theorem 1 (signature poset, Lecluse [48]) Let s1 and s2 be signatures, such that s1 = m1 : a1 : : : ak ! a and s2 = m2 : b1 : : : bl ! b. Then s1 o s2 i k l, ai o bi ^ i 2 [1::k] ^ a o b. Proof. Without loss of generality, we assume that the left sides of signatures s1 and s2 include only one type, i.e. s1 = m1 : a1 ! a and s2 = m2 : b1 ! b. First, if s1 o s2 then (s1) (s2). We can conclude from the de nition of signature interpretation that

38

(a1) (b1) and (a) (b). This implies that a1 o b1 and a o b. The reverse can be proved in a similar way. Since a1 o b1 and a o b, we can conclude that (s1) (s2) and hence s1 o s2. 2 The de nition of the partially ordered set of signatures given by the above Theorem diers from the classical de nition of the signature subtyping stated by Cardelli in [18]. To demonstrate the dierence, we use, as in the previous proof, signatures s1 = m1 : a1 ! a and s2 = m2 : b1 ! b. The subtyping rule in [18] is stated as

if a1 b1 and a b then m1 : a1 ! a m2 : b1 ! b As can be seen by comparing the rule stated in Theorem 1 and the above rule stated by Cardelli, the condition among a1 and b1 is inverted in Cardelli's de nition. In other words, a more general method restricts the parameter domain of the function in the case of the latter rule, while in our rule, a more general signature also has a more general function parameter domain. The use of Cardelli's rule guarantees safe compile-time type checking [18]. On the other hand, the rule given in Theorem 1 follows logically from the de nition of signature interpretation and provides a more exible type system. Some properties of behavioral inheritance are presented in the following paragraphs. An important property of behavioral inheritance is method overriding. Let s1 = m : ao : : : ak ! a and s2 = m : bo : : :bl ! b be the signatures of methods m1 and m2, such that b0 i a0 and s2 o s1. For any object o 2 (b0), the invocation of a method named m on an object o causes the evaluation of method m2. We say that method m2 overrides method m1. Where a class inherits from two superclasses which are not related by the inheritance hierarchy, two methods with the same name can appear in the description of this class behavior. This feature is usually called multiple inheritance [18, 41]. There are many dierent suggestions as to how the multiple inheritance con ict could be handled [41]. Here, we adopt a simple solution used in object-oriented programming languages e.g. C++4. We leave the choice of the correct method to the programmer, who must identify explicitly the class of the method in the event of multiple inheritance con ict. The reason for this decision is practical, since the OVAL query language is intended to extend the functionality of the database programming language based on C++. 4

39

3.4 Concluding Remarks The formalization of the structural and behavioral parts of the object-oriented database model is presented in this chapter. The salient feature of the formalization is the uniform treatment of schema and instance levels of the database. A database is seen as a uniform set of elements ordered according to the more speci c relationship. This view provides a clear interpretation of database model constructs such as class and type. The class object can play the role of a type, when it represents a reference type. As a consequence, the de nition of the ordinary interpretation for types is actually an extension of the class interpretation. The inherited interpretations of a class and a type are de ned analogously, since both of them are based on the inheritance hierarchy of classes. As presented in the following chapter, the elements of the presented data model formalization are used for the de nition of the set of operations for querying conceptual schema. Although from one point of view there is no distinction between the schema and the instance levels of the database, we continue to treat them as separate sets of entities. Furthermore, the restriction whereby the attribute of the instance object can not reference the class object or a type is imposed by the target programming language data model (i.e. C++). By removing this restriction, the attribute describing the type of person employment, for instance, can range to the subclasses of the class employee (e.g. teacher or miner). The problem which now arises is the speci cation of the type of such an attribute: the interpretation of the class person by de nition does not include its subclasses. To model such relationships, the model would have to be extended.

40

Chapter 4 Object Algebra 4.1 Introduction The characteristics of a database algebra always re ect the properties of algebraic structures for which algebra operations are intended. From this point of view, the main dierences appearing in the object algebra in comparison with relational and NF 2 algebra can be seen as consequences of the dierences in data models. The relational data model allows the de nition of two-dimensional relations. The NF 2 model extends the relational data model by removing the rst normal form requirement, thereby allowing attribute values to be relations. The object-oriented model further extends the modelling capabilities of the NF 2 model by allowing the de nition of data structures constructed by arbitrary use of the tuple and the set type constructors. Another feature of the object-oriented model not used in the relational and NF 2 algebra, but which has signi cant in uence on the object algebra, is the inheritance principle. This is used to de ne the classi cation of objects in the database and to be able to share the structure and behavior across instances of dierent classes. The consequences of a more complex composition structure of objects for algebra lie in the need to access and manipulate object components. The algebra should provide a set of operations which allow querying an arbitrary component of an object. Recent object algebras provide more ways to perform operations on nested object components. Most often, access to the nested components of objects is provided by the use of restructuring operations: group, nest, unnest and flatten [82, 70, 52]. The nested components can be manipulated by rst unnesting the component, applying the query to the unnested component and then again nesting the components. Some disadvantages of this approach are listed in Sections 41

1.1 and 2.2. An alternative approach to manipulating nested components of objects is the use of the replace operation suggested by Abiteboul and Beeri in [5]. A similar but less powerful operation is proposed in [70]. We see the main drawback of the approach used in [5] as the need to specify the complete structure of the resulting object when the user would like to apply a query to the nested component and retain the structure of the original object, or make a slight changes to the object structure1. To provide a simple means of querying object components, we propose a generalization of the higher-order function ApplyToAll [10]. The newly de ned operation, called apply at, provides a simple declarative tool for evaluating a query on an arbitrary component of the argument object. The apply at operation has two parameters: the query expression and the path expression. The evaluation of the parameter query is transferred to the object component identi ed by the parameter path expression. The operation is simple to understand and is appropriate for use as a declarative query language construct. Another consequence of the more expressive data model is the need for facilities which allow querying of database conceptual schema. Due to the rich modelling constructs of the object data model, object-oriented database systems store information about the modelling environment in terms of data objects and conceptual schema. We observe that some information about the modelling environment has been moved from the "instance" part to the "schema" part of the database, if we compare the object-oriented database to the relational one. Therefore, the algebra should provide operations that allow manipulation of schema information. For this purpose, the OVAL data model provides an uni ed view of a database in which classes are treated in the same way as ordinary objects. This allows the de nition of the set of operations for querying class properties, browsing inheritance hierarchy of classes, relating classes and instance objects by the partial ordering relationship more speci c2, and others. The presented object algebra is consistent with the features of the previously presented data model. The algebra involves (i) the set of basic algebraic operations and (ii) the set of model-based operations. The basic algebraic operations are used for querying a database. These operations allow ltering of the sets of objects, restructuring objects, calculating least xpoint queries, manipulating nested object components, and some other facilities detailed in Section 4.4. The algebra operations form the basis for the functional language. Every A brief overview of Complex Object Algebra features is given in Subsection 2.2.4 and some comments on the replace operation are given in Subsection 1.1.3. 2 The partial ordering relationship more speci c is de ned in Section 3.2.3. 1

42

basic algebraic operation is a function which can be combined into queries using the function composition operation. The model-based operations are used to access and manipulate object properties expressed by means of data model constructs. They are mainly used to assist the algebraic operations in expressing database queries. They provide facilities for obtaining an object value or an extension of a class object, browsing class object properties, relating instance objects to classes, expressing equality among o-values, etc. All model operations are derived from the concepts used in the data model formalization presented in the previous chapter. This chapter is organized as follows. The following section includes the de nition of the simple language for de ning conceptual schema and a sample database environment used in this chapter. Section 4.3 presents the OVAL data model operations. Next, Section 4.4 includes de nitions and examples of the OVAL basic algebraic operations. Some well-known operations of other algebras are expressed in terms of OVAL algebra operations in Section 4.5. Concluding remarks and some areas for further study are given in the last section.

4.2 Example The examples in the following sections are based on a conceptual schema which describes the simpli ed University environment. We use a simple language for the de nition of classes, types and variables. We tend as much as possible to use the constructs already de ned by the OVAL data model formalization in the previous chapter. class school type static [ name:string, depts: {department}, head: professor ]); end_type; class department type static [ name: string, head: employee, staff: {employee}, secretary: employee ]; end_type;

43

class course type static [ title: string, instructor: lecturer ]; end_type; class person type static [ name: string, friends: {person}, lives_at: string, age: int ]; end_type; class employee isa person type static [ salary: int, manager: employee, dept: department ]; function avr_sal: int; end_type; class student isa person type static [ courses: {course} ]; end_type; class class class class

lecturer isa employee; assistant isa lecturer; professor isa lecturer; stud_assist isa assistant, student;

Each class is de ned by its name, a set of superclasses and a type. The name of the class is speci ed after the keyword class. After specifying the class name, the list of superclasses can follow the keyword isa. The type of the class can be speci ed after the keyword type. The static type of a class is speci ed by an o-value as introduced in the previous chapter. The behavior of class instances is speci ed by declaring the set of methods by means of signatures. We are not concerned with the implementation of methods. The variables used in queries are de ned by the keyword var followed by the variable name and its type. For example, the following expression speci es the variable named pset, whose static type is f[name : string; age : int]g. 44

var pset: {[name:string,age:int]};

4.3 Model-based operations In this section, we present a set of operations used mainly to assist the basic algebraic operations in querying object properties expressed by the data model constructs. They allow the user to relate data objects to classes, compare classes with respect to the inheritance hierarchy of classes, express dierent kinds of equality among objects and browse conceptual schema amongst others. The salient feature provided by model-based operations is the ability to query schema information. There are basically two reasons for querying database conceptual schema. First, the user would like to inquire about the relationships between the data objects and the inheritance hierarchy of classes. For example, the user would like to know if the object jim is an instance of the class student. Similarly, one would like to inquire if the set of jim0s superclasses contains the class employee. Second, the user would like to browse the schema in order to obtain a precise mental image of a conceptual schema [58]. Such browsing facilities would be needed, for example, if the user wanted to add a new class to a rich conceptual schema. Browsing conceptual schema would then be used to detect similar and related classes in a database. The following data model operations are described: valuation function, extension function, comparison operations, closure operations, lub-set and glb-set operations, and equality operations. The use of these operations is illustrated by a set of examples.

4.3.1 Valuation operator The properties of a given class object or data can be obtained using the valuation operator val, which realizes the previously de ned valuation function . The valuation operator maps an object identi er to the corresponding value. It is de ned for all object identi ers, i.e. those referring to the instance and class objects.

Example 1 Let s1 be an oid which is an instance of the class object student. Here, the

label student denotes an oid referring to the class named "student". The expressions s1:val and student:val denote the values of the object identi ers s1 and student.

45

s1:val = [name : "martin"; age : 20; lives at : a1; avr grade : 8; courses : fc1; c2; c3g] student:val = [name : string; age : int; lives at : address; avr grade : int; courses : fcourseg] The use of the valuation operation is usually abbreviated when the valuation operation is followed by another function. For example, the expression s1:val:name can be simpli ed by the use of a dereferencing operator "?>", resulting in the expression s1?>name.

4.3.2 Extension functions We de ned two types of class interpretations in the OVAL data model. The rst interpretation maps a class object to the set of its instances. This operator is usually denoted as class extension [9]. We will call this operator an extension operator ext. The second class interpretation maps a class object to the set of instances of this class and all its subclasses. This operator is denoted by the name exts.

Example 2 This example illustrates the use of extension functions. The expression in the

example is based on the class hierarchy presented in Section 4.2. The following expression denotes the set of persons who are either younger than 22 years or are employed. Note that the value of person.exts includes the set of persons plus the instances of student, employee, lecturer, assistant, professor and stud assist classes.

fo; o 2 person:exts ^ o?>age < 22 ^ o 2 employee:extg

4.3.3 Comparison operations based on o-value poset The simplest and most natural way to express object properties which relate to the partial ordering of o-values is using comparison operations. The o-value poset is de ned using the relationship o. The relationships o; o; o are also de ned. Their semantics is standard, e.g. a o b () a o b ^ a 6= b. We call these operations poset comparison operations. To allow us to relate instance objects to their classes, we introduce the function class of which maps instance objects to their parent classes. Formally, x:class of = c () x 2 (c), where (c) denotes class c interpretation. Note that an instance belongs to exactly one class interpretation. By using the poset operations to relate oids, we can de ne a subset of isa poset such that its elements posses \certain" properties. Let us illustrate the use of poset comparison operations by means of an example based on the class hierarchy from Section 4.2. 46

Example 3 The expression in this example speci es objects that are more speci c than the

class employee and are at the same time instances of either the class stud assist or some more general class.

fo; o 2 person:exts ^ o o employee ^ stud assist o o:class of g

The resulting set of objects includes instances of classes lecturer, assistant and stud assist.

The comparison operations presented in the previous paragraph are based on the subset of the o-value poset previously de ned as the isa poset (O; i). The isa poset allows us to relate complete objects. In the following examples, we present the use of the more speci c relationship to relate object properties in terms of structured values.

Example 4 The expression in this example selects the values of objects of the class person,

that have de ned attributes manager, friends and lives at. The value of the attribute lives at must be a string "Brisbane". The value of the attribute manager is required to be more speci c than the class lecturer. Similarly, the value of the attribute friends is required to be more speci c than the type fstudentg.

fo; o 2 person:exts:val ^ o o [manager : lecturer; friends : fstudentg; lives at : "Brisbane"]g

4.3.4 Closure operations The transitive closure operations subcl and supcl are de ned on the class objects. The operation subcl computes all subclasses of a given class, and the operation supcl calculates all superclasses of a given set of classes.

Example 5 The expression presented in Example 3 can now be stated as follows. fo; o 2 person:exts ^ o:class of 2 employee:subcl ^ o:class of = 6 employee ^ o:class of 2 stud assist:supclg The transitive closure operations express exactly the same relationships among classes and objects as the comparison operations based on the isa poset. The expression x o y, where x and y are classes, can be translated to x 2 y:subcl. In a similar way, the expression x o y can be translated to x 2 y:subcl ^ x 6= y. While the comparison operations can just serve for expressing relationships among objects, the result of the closure operation is a set of classes that can be further queried. 47

4.3.5 The nearest common superclasses and subclasses The common properties of a set of classes are de ned by classes that are the superclass of all classes from the set. For example, the common properties of the set fprofessor; assistantg are de ned by classes person, employee and lecturer. Since the class lecturer inherits the properties of classes employee and person, all common properties are de ned by the class lecturer. Therefore, it is enough to state the nearest common superclass. Computing the nearest common superclass of a set of objects corresponds to the least upper bound operation [60]. For a given set of classes from the isa poset, there can exist more common superclasses which are not related by the subclass relationship i3. For example, suppose that the isa poset is described by the expressions phd student i student; phd student i employee; assistant i employee and assistant i student. The nearest common superclasses of the set fphd student; assistantg are classes student and employee. The operation lub-set is de ned to calculate the set of nearest common superclasses of a given set of classes.

De nition 16 Given the poset P and the set A such that the A P , then the lub-set of A, denoted by A:lub-set, is the set L with the following properties:

1. each element of L is related to each element of A by means of i relationship and 2. all other elements of P that are related to each element of A are more general than at least one element of L.

The glb-set operation is similarly de ned. The glb-set operation nds the set of nearest objects for a given set of classes. The elements of the resulting set are objects which capture all properties that pertain to the objects from the argument set.

Example 6 In this example we present the use of lub-set operation. The expression shown

rst determines the nearest common superclass of the classes stud assist and professor according to the class hierarchy presented in Section 4.2. The extension function is applied to all lub classes; lecturer in this example.

fo; c 2 fstud assist; professorg:lub-set ^ o 2 c:extg 3

The isa poset is not the lattice.

48

4.3.6 Equality Two types of equality operations are introduced for the algebra. The rst operation is the identity equality [15, 70]. Two objects are identical if they have equal object identi ers. This equality is denoted by the symbol "==". The second equality operation is the value equality. It compares objects on the basis of their value. We distinguish between two types of value equality: deep equality and class based equality. The rst compares two objects by comparing all their components recursively. The operator is denoted by "=". Deep equality is de ned as follows.

De nition 17 (Deep equality) Objects a and b are deep equal or a = b, if one of the following holds:

1. a and b are primitive objects and they have the same value, 2. the value of a and b are tuples with an equal number of attributes n and a?>Ai = b?> Ai, where i 2 [1::n], 3. the value of a and b are sets, such that there exists a one-to-one mapping F from a:val to b:val and for each pair (x; y) 2 F , where x 2 a:val and y 2 b:val, the equality x = y holds. Class based equality allows the comparison of objects whose type is not the same. This operator is denoted by "=/class". To be able to compare two objects on the basis of the properties of the class C , both of these two objects must inherit from the class C . The class based equality is de ned as follows.

De nition 18 (Class based equality) Objects a and b are class value equal for a class C or

a =/C b i

1. a i C and b i C , 2. for all Ai, Ai 2 Attr(C ), deep equality a?>Ai = b?>Ai holds.

Example 7 One example of the use of class value equality is the following. Suppose we compare objects (i1; [name : "tone"; age : 40; friends : fs1g; works at : ijs; salary : 10000]) and (i2; [name : "vanja"; age : 24; friends : fs2; s3g; works at : ijs; salary : 10000; cour : 49

fc1; c2g; avr grade : 9]). The rst object is an instance of the class employee and the second

is an instance of stud assist. Objects are not deep value equal, yet they are value equal considering the properties that pertain to the class employee i.e. attributes works at and salary.

4.4 Algebra operations Every algebraic operation is a function described by: (i) an argument which represents a set of objects, (ii) a possibly empty set of parameters, and (iii) a result which is an o-value. The type of the resulting o-value depends on the particular operation. Operations can be combined using the composition operation and/or the higher-order functions. A single operation or a composition of operations form a query. The query can be formally de ned as follows:

De nition 19 (query) A query is an expression o:f1 : : : fn = fn (: : :f1(o) : : :), where o 2 VD and fi (1 i n) are algebra operations. The operations of the OVAL algebra are as follows: rst, the apply operation is a higherorder function which is used to evaluate a parameter function on a set of objects. The object selection is used for ltering the set of objects. The relational operations union, difference and intersection are de ned. The close operation is used for computing the closure of a set of objects using a parameter query. The tuple operation is a generalization of the relational projection. It is a higher-order function which applies the list of parameter queries to the argument set of objects. The group and the unnest operations are used for restructuring o-values. Finally, the apply at operation is used to evaluate queries on the object components. The basic algebraic operations may be seen as de ned from three groups of operations. The rst group of operators relates to the conventional ( at) relational operations that form set-based algebra. These operations are: selection, union, difference and intersection. The second set of operators involves the two restructuring operators that emerge from NF 2 algebra [1, 62]. These operations are the group and unnest operations. They cover the functionality of NF 2 nest and unnest operations, as well as object restructuring operations group and flatten [70]. Finally, the third group includes a set of higher-order functions derived from functional languages [10]. These operations are: apply, tuple and apply at 50

operations. They are used to apply parameter queries to the arbitrary components of the argument object and to generate the set of tuple structured objects. In this section we describe each operation of the algebra. The operations are rst de ned in a formal manner. The functionality of operations is then described using one or more examples.

4.4.1 Apply The operation apply(f ) is used to evaluate a parameter function f on the elements of the argument set. The parameter function f can be an attribute, a method or another query. The de nition of the operation apply is given below.

De nition 20 (operation apply) Let s, s 2 VD , be a set of o-values and let f be a query. The result of evaluating operation apply(f ) on the set s is the following:

s:apply(f ) = fz; o 2 s ^ z = f (o)g

Example 8 An example of the use of the operation apply is given below. The presented

query maps a set of students (i.e. object identi ers) onto a set of student names. The identity function id is used to identify the elements of the set studs which is an argument of the function. var studs: {student}; var str: {string}; str = studs.apply(id->name);

Suppose the value of the variable studs is a set fs1; s2; s3; s4g, where si are oids representing instances of the class student. The result of the query, i.e. the value of the variable str, is the set f"Tone"; "Marjan"; "Miha"; "Matej "g, for instance.

As mentioned above, the apply parameter function f can be any query. The following example illustrates the case where the nested apply operation is used to access nested sets.

Example 9 The following example illustrates the case where the parameter of the apply

operation is another apply operation. This provides a tool for accessing nested sets (e.g., set of sets).

51

var isset: {{instructor}}; isset = student.ext. apply(id->cour). apply(apply(id->instructor)));

The above query rst maps each student to the set of his courses, where the result is a set of sets. The nesting of the apply operation provides access to the elements of the nested sets and maps them onto the course instructor. The nal result is the set of sets that includes identi ers of instructors. The result of the query is, for instance, the o-value ffi1; i2g; fi3; i4g; fi5gg, where ij represent oids that are instances of the class instructor.

The syntax of the apply operation application can be abbreviated. Instead of writing s:apply(f ), which means that we apply f to the set of objects s, we simply write s:f . This abbreviation makes the query language easier to use and improves the comprehensibility of algebraic expressions. The previous example can be now expressed as: isset = student.ext->cour.apply( id->instructor );

4.4.2 Selection This operation serves for ltering an argument set of o-values using a parameter predicate. The parameter of the selection operator is a predicate which speci es the properties of the selected o-values. Selection is de ned as:

De nition 21 (operation select) Let s, s 2 VD , be a set of o-values and let p be a predicate

function de ned on elements of s. The predicate p returns a boolean value. The result of applying the operation select(p) to the set s is the set of o-values de ned as:

s:select(p) = fo; o 2 s ^ p(o)g The type of the result (which is a set of objects) is the same as the type of the input set of objects. When the selection is applied to a set of object identi ers, the existing objects from the database are ltered according to the predicate. If the argument of the selection is a set of object values, the result is a new set of objects. Therefore, selection can perform object generation or object preservation operations [72]. 52

The predicate p is a boolean function speci ed by an expression that can be composed of simple or complex predicates. The latter is composed of predicates (simple and complex) combined using boolean operations and, or and not. Simple predicates are of the form [q1] t1 op [q2] t2, where t1 and t2 are terms. The value of a term is an o-value. The binary relationship op is stated between the values of terms t1 and t2. The elements of the sets resulting in evaluating terms t1 and t2 can be quanti ed using the quanti cation operators q1 and q2 respectively. The use of quanti cation operations is described shortly. A term can be an o-value speci ed by a user, a typed variable or a query. The binary operation op, which speci es the relationship between terms, can be a simple comparison operation, a poset comparison operation, a set inclusion operation, or a set membership operation. The simple comparison operations >; courses )-> name;

The result of the above query is a set of strings, e.g. f"Marjan"; "Jim"; : : :g.

As presented by the de nition of a predicate, the quanti cation operator can be placed on either side of the binary operation. In this way, objects which are the result of a term evaluation are implicitly bounded. In the simple predicate [q1] t1 op [q2] t2, the quanti cation operator q1 relates to the set resulting from the evaluation of t1, and q2 relates to the value of t2. We will describe in detail the case where the result of the left term is quanti ed. Other cases can be treated in a similar manner.

o1 op o2 i o 2 o1 : o op o2 4 5

The poset comparison operations are described in more detail in the Subsection 4.3.3 The identity and value equalities are de ned in Subsection 4.3.6.

53

The represents all (8) or some (9) quanti cation operator. To satisfy the above binary relationship when is an existential quanti er 9, for instance, there must be at least one o-value from the set o1 which satis es the given binary relationship op.

Example 11 To illustrate the use of the quanti cation operator, a query which computes

courses taken by at least one student is de ned as follows. var VisitedCourses: {course};

VisitedCourses = course.ext. select( id in some student.ext->courses );

4.4.3 Set operations The algebra includes set manipulation operations: union, intersection and dierence, which are denoted as union, intsc and dier respectively. These operations are de ned in a functional manner so that they can serve as parameters of higher-order operations. The union operation is de ned as follows:

De nition 22 (set operation union) Let s, s 2 VD , be a set of o-values and let q be a query. The application of the operation union(q) to the set s results in the union of the set s and the result of the query q, denoted by eval(q). Formally,

s:union(q) = fo; o 2 eval(q) _ o 2 sg When the type of s is fT1g and the type of the query q result is fT2g, then the type of the resulting set is flub(T1; T2)g, where lub is the least upper bound operation. The elements of the argument sets can be oids or values. The deep value equality (see Subsection 4.3.6) is used for computing set operations. Since the semantics of the operations intsc and differ are also standard, we omit the de nitions.

Example 12 This example illustrates the use of the operation union. The query described below computes the union of (i) the set of instructors who work in the department e4 and (ii) the set of students who have at least one instructor from this department. var InstAndStudE4: {person};

54

InstAndStudE4 = instructor.ext. select( id->dept->name = "e4" ). union( student.ext. select( "e4" in id->courses-> instructor->dept->name ));

4.4.4 Close The close(q) operation is de ned in order to provide the end-user with a simple tool for the manipulation of recursive data structures. The semantics of the operation close(q) is de ned as follows: given an argument set of objects which are instances of a type T , the closure of this set is computed using the parameter query q. The result of the evaluation of the query q must be objects of the type T . The operation is formally de ned as:

De nition 23 (operation close) Let s, s 2 VD, be a set of o-values and let q be a query, such that T (q) o T (s). The result of evaluating operation close(q) on the set s is the closure of the set s under the query q, described as follows. s:close(q) = w; (a) w = s; if s1 = fg or otherwise (b) w = (s [ s1 ):close(q); where s1 is de ned as follows: s1 = foi; pj 2 s ^ oi op pj :q ^ oi 2= sg: The operation op is either the equality operation =, or membership operation 2, depending on the cardinality of the result of the query q. 2

The type of the result obtained by applying the operation close on a set of objects s is the same as the type of the input argument set s.

Example 13 The following query rst selects the set of instances of the class person who earn less than $10000. The set is then extended by computing all managers of the selected people. var Empl: {employee}; Empl = employee.ext. select( id->salary < 10000 ). close( id->manager );

55

4.4.5 Tuple The tuple operation is an extension of the relational projection. Given a set of objects as an argument for the operation, a tuple is generated for each object from the argument set. Each component of the newly created tuple is speci ed by the corresponding tuple parameter, which is composed of the attribute name and the query. The result of the parameter query evaluation serves as the attribute value. Therefore, some properties of input objects can be manipulated and then given as the attribute values of the newly created tuples. The tuple operation is formally presented below.

De nition 24 (operation tuple) Let s, s 2 VD , be a set of o-values, ai 2 A attribute names (1 i k) and fi queries. The result of evaluating operation tuple(a1 : f1; : : : ; ak : fk ) on the set of o-values s is the set of tuples described as follows.

s:tuple(a1 : f1; : : :; ak : fk ) = f[a1 : f1(o); : : : ; an : fn(o)]; o 2 sg The type of the result obtained by applying the tuple operation to the set of objects is f[a1 : T (f1); : : :; an : T (fn)]g. The tuple operation can also be applied to a single object, producing a single tuple.

Example 14 In this example, the query constructs a tuple for each instance of the class

student extension. The tuple consists of the student name and the set of tuples that describe

courses taken by the student. Nested tuples include the titles and instructors of the courses. Note that the query is abbreviated by omitting identity function id. var StudentCourses: {[ sname: string; courses: {[ iname: string, title: string ]}]} ; StudentCourses = student.ext-> tuple( sname: name, courses: courses-> tuple( iname: instructor->name, title: title ));

56

4.4.6 Group The operation group(a : f; b : g) is a higher-order function which requires the speci cation of two parameter functions f and g. The value of the query parameter f serves as the key for creating groups of objects. The parameter function g is used to compute objects that are the subject of grouping. Therefore, functions f and g are evaluated for every element of the input set. The objects which are the results of function g application are grouped according to the results of the function f . The result of the group operation is an o-value whose structure is a two column table. The rst column, labeled a, contains all distinct values of the function f applied to the input set of objects. The second column, labeled b, includes the groups of results of function g application on objects which share the common value of the function f application. The de nition of the operation group is given below.

De nition 25 (operation group) Let s, s 2 VD , be a set of o-values, let a and b be attribute

names and let f and g be queries. The result of evaluating operation group(a : f; b : g ) on the set of o-values s is the set of pairs described as follows. s:group(a : f; b : g) = f[a : o1; b : o2]; 8v:g; p:g 2 o2 : v; p 2 s ^ v:f = p:f = o1 ^ (6 9z 2 s : z:f = o1 ^ z:g 2= o2 )g

Note that the second component of the resulting table contains sets of o-values obtained by applying the function g to the elements of the mutually disjunctive subsets of the original argument set s. The type of the resultant object is f[a : T (f ); b : fT (g)g]g.

Example 15 Use of the group operation to group the objects of the class employee according to their departments is presented in this example. var EmpGroups: {[ dept: department, emps: { employee }]}; EmpGroups = employee.ext. group( dept:id->dept, emps:id );

4.4.7 Unnest There are three dierent \types" of restructuring operations which can be used to unnest structured objects in argument objects. One of the operations produces a at structure from 57

a set of sets. The second operation is used to unnest the tuple component, which is a set of o-values. The last operation unnests the tuple component, which is a tuple of o-values. These three operations are de ned in many algebras, where they are usually denoted by the flatten operation [70], the unnest operation [62], and the tup collapse operation [5]. A single restructuring operation unnest with dierent behavior depending on the structure of argument object is de ned. This has the advantage of providing the user with a common view as a single operation of the query language. Before introducing the de nition of the unnest operation, which allows the use of a single operation for all kinds of argument object structure, we rst de ne each sub-operation.

De nition 26 (operation unnest) Let s, s 2 VD , be a set of o-values and A be an attribute

name. The result of evaluating operation unnest on the set of o-values s depends on the structure of argument set s elements and is one of the following.

T (s) = ffT gg ^ T 2 VT ; s:unnest = fo; si 2 s ^ o 2 sig T (s) = f[: : :; Ai : fT g; : : :]g ^ T 2 VT ; s:unnest(Ai) = ft; p 2 s ^ p:Aj = t:Aj ^ j 6= i ^ t:Ai 2 p:Aig T (s) = f[: : :; Ai : [: : :]; : : :]g; s:unnest(Ai) = ft; p 2 s ^ t:Aj = p:Aj ^ j = 6 i ^ t:Bk = p:Ai:Bk g: 2 Example 16 The use of the unnest operations is illustrated by the following examples. The

rst query computes all courses with students are enrolled. The second query computes the set of all pairs that include employees and their departments. var DeptEmp: {[ dept: department, emp: employee ]}; var VisitCour: {course}; VisitCour = student.ext.cour.unnest; DeptEmp = department.ext. tuple(dept:id,emps:id->staff). unnest(emps);

The eect of the unnest operation can be seen as releasing the structure of the nested component, obtaining in this way an unstructured group of objects in place of the nested 58

component. The unstructured group of objects is then merged with the parent structure of the unnested component. The unstructured group of objects contains unlabeled objects if the set is unnested. If the tuple structured object is unnested, the group obtained includes labeled objects. The resulting groups can be represented as sequences of the form =o1; : : : ; ok =. Merging of sequences with their parent objects is de ned by means of rules. Note that objects are denoted by oi and labeled objects are denoted by li. The rules are de ned as follows: (1) f=o1; : : : ; ok =g = fo1; : : : ; ok g, (2) =o1; : : : ; ok ==ok+1 ; : : : ; om= = =o1 ; : : :; om =, (3) [: : :; ai : =o1; : : : ; on =; : : :] = =[: : : ; ai : o1 ; : : :]; : : :; [: : :; ai : on ; : : :]= (4) [: : :; ai : =l1; : : : ; ln=; : : :] = [: : :; l1; : : :; ln; : : :] The rst rule states that the set which contains the sequence of objects is identical to the set of objects. The second rule de nes a concatenation of two sequences. The third rule describes the situation where a sequence of unlabeled objects is a component of a tuple. Finally, the fourth rule describes the unnesting of tuple objects. The unnest operator can now be de ned as an operation which changes the structure of a desired component into a sequence.

Example 17 The queries from Example 16 can now be expressed in a new form as follows:

in the rst query, the unnest operation is applied to the sets nested in the argument set. The second query applies the unnest operation to the set nested in a tuple resulting a at relation. VisitCour = student.ext.cour. apply(unnest); DeptEmp = department.ext. tuple( dept:id, emp:id->staff.unnest );

The operation apply at, described in the following subsection, serves as a tool for applying a query to an arbitrary object component. This means that the semantics of the unnest operator can replace the previous three de nitions (Def. 16). More examples of using unnest operation will be given later.

59

4.4.8 Querying object components To be able to apply a query to any nested component of a complex object, the functionality of the apply function is extended by adding a new parameter which serves as a component selector. The resulting operation is called apply at. The operation apply at(p; f ) rst identi es one or more component sets by applying the aggregation path p to the argument object. The query f is then applied to every selected set. The evaluation of the aggregation path serves solely for component identi cation and does not restructure the argument complex object. The only manipulation activity that results from the apply at operation is the result of the argument function f evaluation.

Example 18 Let us consider an example to illustrate the use of the apply at operation. Given a variable which includes the set of structures that describe departments, the query presented below lters every component of the department that is identi ed by the attribute name staff . The resulting objects, i.e. the elements of the set dept, include only employees older than 45. var dept: { department.val }; dept = dept.apply_at( staff, select(id->age > 45));

The semantics of the apply at operation can be described using the ordinary apply operation. The evaluation of the apply at is realized by the recursive routine described in Figure 4.1. The path expression is evaluated by recursive traversal through the object structure. Every time the path expression becomes empty, which means that the selected component has been reached, the query parameter is evaluated on the selected component object. There are two interchanging phases in the above algorithm: (1) the recursive evaluation of the path expression p (lines 8-13) and (2) the parameter query evaluation (lines 3-6).

At every step of the rst phase, the rst attribute from the path expression is evaluated

on the current input argument s. The rest of the path expression is evaluated by recursive application of the apply at operation. The rst attribute of the path p is evaluated either on a set of objects or on a single object. In the rst case, the ordinary apply operation is used to evaluate the attribute on the set of objects (line 9), resulting in a set of objects. The apply at operation is then recursively applied to each element 60

Input: An o-value s, a parameter aggregation path p and a parameter query f . Output: An o-value s modi ed by applying a query f on components accessed using the aggregation path p.

Algorithm: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15.

function s.apply_at( p, f ); begin if p is empty path then begin s1 = s.apply( f ); replace s by s1 in the original input o-value s; return s1; end else begin if s is set then foreach e in s.apply( first(p) ) do e.apply_at( rest(p), f ); if s is tuple then s.first(p).apply_at( rest(p), f ); return s; end; end_apply;

Figure 4.1: Algorithm for evaluating the apply at operator

61

of the previously obtained set. In the second case, the apply at operation is applied to the tuple component selected by the rst attribute in the path.

In the second phase of the evaluation of the apply at, the parameter query q is evaluated

on the selected component s. The result is an o-value s1, which replaces the component s in the original argument object. If the parameter query q includes restructuring operations, the type of the argument object is changed in the selected component. Therefore, the operation can either retain the structure of the argument object or can restructure the selected component according to the argument query.

Example 19 In the previous example, the query retained the structure of the argument

object. In the following example we illustrate the use of the restructuring operation tuple as the parameter of the apply at operation. Given the variable facult that describes the set of faculties, the query replaces every occurrence of the oid referring to an employee by a tuple that contains the name and the age of the employee. var facult: {[ name: string, depts: {[ name: string, emps: { employee }]} ]}; var facult1: {[ name: string, depts: {[ name: string, emps: {[ name: string, age: int ]}]} ]}; facult1 = facult.apply_at( dept.emps, id->tuple(name: id->name, age: id->age ));

If the parameter path expression includes the valuation function, then the path expression traverses through the structure of more than one complex object. The complex object which is the input to the apply operation serves as the starting point through which the evaluation of the parameter query is despatched to other objects in the database. This is illustrated in the following example.

Example 20 The apply at operation in this example simulates an update operation. The

apply at operation traverses through the objects representing departments and the actual evaluation of the query id 1:5 takes place on instances of the class employee. The attributes

describing the salaries of employees from departments e4 and e1 are increased by a factor of 1.5. The state of each changed object is updated and the object retains its identi er.

62

var dps: {department}; dps = department.ext. select( id->name in {"e4","e1"}). apply_at( id->emps->salary, id*1.5 );

The path expression of the apply at operation provides access to the at sets. To be able to access the elements of nested sets (i.e. set of sets), an additional attribute named in is introduced. Its functionality is similar to the ordinary apply operation. If the argument of the in attribute is a set of objects, then the attribute that follows the in attribute is applied to every element of the in's argument.

Example 21 The query in this example demonstrates the use of the attribute in. The

argument of the query contains a set of tuples, each of which describes the particular school by its name and the set of sets of students. For every student, the set of attended courses and his/her name is speci ed. The query accesses the sets of courses and replaces the set of oids by the set of tuples. var sg: {[ school: string, groups: {{ [name: string, attends: {course}] }} ]}; var sgu: {[ school: string, groups: {{ [name: string, attends: {[cname: string, cinst: instructor]} ] }} ]}; sgu = sg.apply_at( groups.in.attends, tuple( cname: id->name, cinst: id->instructor ));

4.5 Expressing Operations of Other Algebras The expressive power of the algebra can be assessed by expressing the operations of other algebras. This section provides some examples in which the OVAL algebra is used to simulate the operations of other algebras. We demonstrate that the OVAL operations can express relational algebra operations, NF 2 restructuring operations and the join operation as de ned by Shaw in [70]. We start with the conventional relational project operation. 63

Example 22 In this example, the relational project operator is simulated by the OVAL tuple operation. The relation representing employees is projected on the attributes name and manager. var et: {[ ename: string,emanager: employee ]}; et = employee.ext-> tuple( ename: id.name, emanager: id.manager);

The unnest function can be used together with the function tuple to simulate the relational operation Cartesian product.

Example 23 The Cartesian product of the student and course class extensions is realized using the tuple and unnest operations.

var stXco: {[ stud: student, inst: course ]}; stXco = student.ext. tuple( stud: id, inst: course.ext. unnest );

Other basic operations [45] of relational algebra can be expressed directly using OVAL operations. These operations are: union, difference and select. The object join operation ojoin [70] can be simulated using the OVAL operations tuple, unnest and select. The ojoin(s; t; p) operation rst computes the Cartesian product of the sets of oids s and t and then selects pairs that satisfy the predicate p.

Example 24 The query in this example computes all pairs of students and instructors who live in the same city. var stXin: {[ stud: student, inst: instructor ]}; stXin = student.ext. tuple( stud: id, inst: instructor.ext. unnest ). select( id.stud->addr.city = id.inst->addr.city );

64

The NF 2 unnest [62] operation can be expressed by the OVAL unnest operation.

Example 25 The query in this example unnests the relation speci ed by the attribute staff ,

which is a component of the argument relation dept. The rst OVAL unnest operation unnests the set, which is speci ed by the attribute staff . The result of the rst unnest operation is the set of tuples, where the components named staff include single tuples describing employees. The second unnest operation produces a at relation. var dept: {[ dname: string, staff: {[ ename: string, salary: int ]} ]}; var fdept: {[ dname: string, ename: string, salary: int ]}; fdept = dept.apply_at( staff, unnest ). apply_at( staff, unnest );

The NF 2 nest [62] operation can be expressed by the operation group.

Example 26 Assume that a portion of an employee object is represented using the relation described below. The query shown nests the values of the attributes ename and salary.

var femps: {[ ename: string, salary: int, works: department ]}; var emps: {[ staff: {[ ename: string, salary: int]}, works: department ]}; femps = emps. group( works: id.works, staff: id.tuple( ename: id.name, age: id.age ));

We demonstrated that the relational algebra operations can be expressed in the OVAL object algebra. Similarly, NF 2 restructuring operations can also be simulated using OVAL algebra operations. It is interesting to note that the recursive operations of NF 2 algebras used on arbitrary relations can not be expressed by a single expression of the OVAL algebra or other existing object algebras [5]. 65

4.6 Concluding remarks This chapter described the OVAL algebra for objects. The algebra includes relational operations selection, standard set operations, object restructuring operations, functional operations apply and apply at and operations for querying schema. The algebra operations are intended to serve as the basis for a declarative query language. As a result, the set of operations is not designed to be minimal; rather, the functionality of some groups of algebra operations overlap. Queries are constructed by means of function composition and the use of higher-order functions. The construction of an OVAL query can be described as follows: in the rst step, the static type of the query result is de ned or constructed. This type exhibits the structure of the desired data which has to be extracted from the database. The o-value of this type is constructed using a sequence of operations that gather data from the database and restructure it in a sequence of intermediate resulting objects (temporary objects) to build the nal resultant o-value. Since the higher-order operations can also consist of a query, the query structure can be seen as a directed acyclic graph, with nodes representing the intermediate results of the query and arcs standing for operations that form the query. Query evaluation can be then seen as the ow of data that is formed by intermediate objects of query evaluation. The advantages of such a means of query perception and construction are as follows: rst, the user can form a graph-based plan for a query. Next, the query consists of a number of steps which can be treated as separate sub-problems. The solutions of sub-problems are composed into the nal query using the composition operation and higher-order functions. Therefore, the process of constructing the target data structure is incremental, where intermediate steps can be identi ed and expressed using a query function. Additionally, the available operations exhibit a high level of abstraction, which provides the user with an expressive tool for querying a database. In this context, the operation apply at provides a useful tool for manipulating and even restructuring the inner components of the object under construction. The presented style of constructing programs is dierent from the classical procedural programming language approach, and from the classical SQL or calculus-based approach for expressing queries. The approach follows Backus's functional view [10] of programming languages characterized by abstract operators, simple semantics of constructing more abstract programs from simpler programs and provably correct programs. Finally, the OVAL algebra provides a set of operations for querying conceptual schema. 66

The operations provide only basic tools for manipulating schema. In fact, most of the provided schema operations are drawn from the formalization of the data model given in Chapter 2. If the algebra is integrated into a computationally complete database programming language which supports the proposed data model, the proposed schema operations can be used to form more abstract operations on the schema. In this way, operations such as those suggested by Papazoglou in [58], e.g. nding associations between arbitrary classes or nding classes which exhibit similarities with a given class, can be implemented.

67

Chapter 5 Prototype Implementations 5.1 Introduction Two prototypes of the OVAL query language were developed. The rst prototype was implemented using Sicstus Prolog [73]. The prototype serves as an experimental environment for studying some aspects of object-oriented data models and object algebras. Some design decisions and approaches taken in this implementation can be used as the basis for a more ecient implementation. The second prototype is implemented as an extension of the E database programming language. It mainly served for studying the problems of OVAL integration with the C++ based database programming language. The apply and select operations are implemented in the prototype. In both prototypes we did not address the problems of ecient implementation and optimization. In the following subsections, we present an overview some aspects of the Prolog and E based implementations.

5.2 Prolog based prototype The prototype is intended to serve as an environment for experimenting with the OVAL data model and object algebra operations. We aimed to observe and study previously presented theoretical aspects of OVAL using an implementation that is exible enough to be able to experiment. The implementation is not tied to any particular type system or subject to any other constraint, unlike, for example, our second prototype, which extends the E database programming language. Some of the practical motives in the implementation are the study of the substitutability principle and, in its presence, the implementation of type checking. 68

The prototype comprises an interpreter, a user-interface and a storage manager. The OVAL interpreter is implemented by a simple top-down interpreting technique. Type checking is implemented to provide the user with diagnostics for type errors. Some very obvious optimization tricks are used in the implementation to speed up the evaluation of some types of nested queries. Otherwise, query optimization is not considered in this prototype. The command-line interface is implemented in the Prolog environment. The user can evaluate a query, obtain a help system and set some system variables. The implemented storage system is based on Prolog clauses. We use the subset of the structural part of F-Logic [43] to represent objects. The storage manager consists of a set of routines that allow simple access to the database. In this section, we rst present a set of examples of use of the OVAL interpreter. Next, the type checking of OVAL queries is described in the second part of this section.

5.2.1 Examples We give some examples of using OVAL interpreter to demonstrate its functionality. The syntax of OVAL queries is slightly altered because of the special meaning of dot "." in Prolog. The dot operator is replaced by "?>" operator, while the dereferencing operator "?>" is changed to "=>". Otherwise, the syntax and the semantics of the implemented language follow the data model and algebraic operations described in Chapter 2 and 3. The following query computes the set of tuples describing students older than 24 years. OVAL 0.3, May 1994 Copyright (C) Jozef Stefan Institute |: student->ext-> select( id=>age > 24 )-> tuple[name: id=>name, age: id=>age ]. |---------------------------| |name:murn |age:27 | |---------------------------| ... |---------------------------| |name:janko |age:26 | |---------------------------|

The prototype allows the use of variables. A variable obtains its value and type by the 69

rst assignment. After that, it acts like any other OVAL object. In the following example, students are grouped by their age. The result is stored in the variable named a. |: a := student->ext-> group( age: id=>age, studs: id ). |-----------------------------| |age:22 |studs:|---------|| | | |s1 || | | |---------|| | | |s3 || | | |---------|| | | |s8 || | | |---------|| |-----------------------------| ... |-----------------------------| |age:21 |studs:|---------|| | | |s6 || | | |---------|| |-----------------------------|

The variable a stores an object identi er whose value represents the stored o-value. The previously computed value can therefore be obtained using the expression a?> val. The following query demonstrates how the type of the variable a value can be obtained. |: a->class_of->val. |-----------------------------| |age:int_ |studs:|---------|| | | |student || | | |---------|| |-----------------------------|

We present another example of using schema information to query a database. Recall the example of using comparison operations to relate oids from Subsection 4.3.3 (Example 3). The query selects all persons whose class is more speci c than employee and more general then or equal to stud assist. The following problem arises with the query. The type of the result of the query person?> exts is fpersong. Due to the substitutability principle, the elements of the resulting set can be instances of the class person and/or instances of its 70

subtypes. Since our implementation stores the types of intermediate complex objects and does not remember the types of individual components, the implicit iteration variable id is of the type person. Therefore, the expression id < employee would not compute correct results. The function class of can be used here to obtain the actual class of oids. The following query computes the classes of objects resulting from the described selection. |: person->exts-> select( id->class_of < employee and stud_assist =< id->class_of )-> class_of. |-----------| |lecturer | |-----------| |assistant | |-----------| |stud_assist| |-----------|

5.2.2 Type checking A compiler or an interpreter should check the syntactic and semantic conventions of the source language [6]. One way of controlling the correctness of a program is type checking. Here, errors which arise when an operator is applied to incompatible operands are reported by a type checking procedure. Two kinds of type checking methods are commonly used. Static type checking allows us to nd a type error at compile time. Dynamic type checking veri es the correctness of operation arguments at run time. Static type checking requires that the resulting type of any language operation can be determined statically. Static type checking of OVAL queries is realized by a set of rules1. Given the query expression, the type checking routine deduces the type of the resulting object. One or more rules can be used to derive the resultant type for a single algebra operation2 . We distinguish between rules that are used to derive the type of queries obtained by a composition of basic algebra operations and rules which serve to derive the type of the expression used as the selection predicate or as a separate query. The substitutability principle is one of the more important properties used in the type1 2

The rules are written in denotational semantics [56] similar to those used by Cardelli in [18]. The type checking procedure is described in more detail in Appendix B.

71

checking procedure. The domain of a variable of type T is the inherited interpretation of the type T . This correlates with the functionality of the type checking procedure, which deduces the least upper bound type of the resulting objects for a given operation. Hence, objects which are the result of evaluating a query q can be instances of a type resulting from evaluating type rules on the query q as well as instances of its subtypes. The main reason for this is the semantics of the attributes and methods: De nition 10 and Corollary 4 state that a function (i.e., attribute or method), which overrides some other function, must have a more or equally speci c range type. First, some basic type checking rules for OVAL expressions are presented. The rules which describe the derivation of types for the tuple component selection and valuation function are the following: T [ o] = o:type, T [ o:A] = if T [ o] = [:::; C1 :: A : T1; :::; Cl :: A : Tl; :::] l 1 and 9Ci8Cj : (Ci; Cj 2 fC1; : : :; Clg ^ Ci i Cj ) then Ti else fail T [ o:Ck :: A] = if T [ o] = [:::; Ck :: A : T; :::] then T else fail

T [ o:val] = if T [ o] = C and C 2 OC then (C ) else fail

The expression C :: A denotes the attribute A, which is de ned as the property of the class C . The rst rule simply picks the type of the variable. The second rule describes a tuple component selection in the presence of attribute overriding. The attribute de ned within the most speci c class is selected. In the case of multiple inheritance (rule 3), the domicile class of the attribute must be speci ed explicitly. The fourth rule describes the type checking of the valuation function. The type of the valuation function argument must be a referential type. The type of the value is obtained by using the inherited type assignment 72

function. Let us now describe the semantics of the rule for the derivation of the type of the operation apply result. Suppose that we evaluate the query apply(id?>A) to the set of oids of the class C . The range type of the attribute A, de ned on class C , is a type TA. Note that the argument set can include instances of C as well as instances of more speci c classes. Since the attributes and methods de ned by more speci c classes can only re ne the range type of the attribute (see De nition 10 and Corollary 4), the resultant set includes instances of type TA and instances of TA subtypes. As an example, suppose that the argument of the query apply(id?>works for) is the set of oids referring to objects of the class person. The set can also include instances of any subclass of person. The result of the apply operation is the set of oids which refer to elements of the class organization. More precisely, oids can refer to instances of the class organization as well as to instances of any its subclasses (e.g. business organization). The rule for deriving the type of the result of the ordinary apply operation is speci ed as follows: T [ Q1:apply(Q2)]] = if T [ Q1] = fT1g and T2 = T [ Q 2 ] then fT2g else fail

The rule states that the result of the query that precedes the apply operation has to be a set. The type of the query Q2 result speci es the structure of the resulting set elements. If the type T1 is not compatible with the query Q2, the algorithm will stop and report a failure in one of the rules that checks the type of the query Q2. The following rule speci es type checking of the operation group. T [ Q:group(A : Q1; B : Q2)]] = if T [ Q] = fT g and T1 = T [ Q1] and T2 = T [ Q2] then f[A : T1; B : fT2g]g else fail

The rule rst checks that the input type of the group is a set of o-values. The types of the queries Q1 and Q2 are determined. If the type error occurs when computing the types of Q1 and Q2, the type checking routine simply stops. This can be the result of inappropriate use of (implicit) input type T instances in the queries Q1 or Q2. The resulting type is the 73

set of tuples whose type is composed of the types T1 and T2.

5.3 Extending E database programming language The second prototype extends the E database programming language [23] with the operations of the OVAL query language. The basic intention of the prototype implementation is to study OVAL integration with the C++ based database programming language. The functional nature of OVAL makes the language suitable for integration with a procedural language; the semantics of dot expressions, used for accessing object components in procedural programming languages (e.g. C++), is extended to include functional queries. In this section, we rst describe some approaches to the integration of databases and general purpose programming languages. The basic features of the E database programming language are presented in Subsection 5.3.2. Some implementation details of the OVAL integration with the E database programming language are given at the end of this section. More detailed description of the implementation is presented in [40].

5.3.1 Integrating databases and programming languages The practical need for the integration of databases and general purpose programming languages has stimulated substantial research in this area [9]. Integration results in a programming environment usually denoted as a database programming language (DBPL). The DBPL is a computationally complete programming language which provides constructs for the de nition and manipulation of database objects. Among the most important features of a DBPL are persistence, type completeness and expressive power [9]. Persistence is the property of objects created by a programming language system. The transient object can exist only when its creator (i.e., a program) is active. A persistent object can survive many executions of its creator. The object retains its state after its creator has terminated and can be used by the same or another program in subsequent executions. The type completeness property requires that all data types enjoy the same status within the language. The use of type constructors provided by the language should not be limited by any particular rule. Therefore, the user can use existing type constructors in an arbitrary way to construct new types. By the expressive power of a language we mean the ability to compute a class of computable functions and the declarative expressiveness of the language. In general, the user would like to enjoy the computational completeness of a procedural 74

programming language or the like, while being able to express declarative queries usually provided by a query language. Due to the often very dierent syntax and semantics of these two languages, their integration results in impedance mismatch problems [8]. Some of the existing approaches to the integration of declarative query languages with high-level programming languages are described below. Embedding a query language into a high-level programming language is the most widely used approach. The incompatibility of the syntax and semantics of these two languages is apparent. The relational database programming languages integrate a relational DBMS with general purpose programming languages. These languages usually use the record type as a platform for modelling relations. Access to and manipulation of records is provided by the iteration construct foreach. Similarly, the database programming languages E [23], O2 DBPL [49], ODE [7] and others, integrate an object oriented DBMS with an object-oriented programming language. The DBPL provides constructs for the de nition of persistent objects and allows the manipulation of persistent objects using explicit iteration. However, both relational and object-oriented DBPLs do not provide a set of declarative constructs with expressive power similar to SQLbased query languages. The functional database programming languages [53, 25] add a set of prede ned algebraic operations to the language that uses implicit iteration. The queries are comparable to the queries expressed by SQL-based query languages like the O2SQL [11] query language, although the latter can oer greater simplicity for some queries [25].

5.3.2 E database programming language The E database programming language was originally intended to serve as the basis for database management system implementation. It is designed on the top of the Exodus storage manager [24]. The E language extends C++ by adding constructs which allow the de nition of persistent objects. Further, persistent objects are treated in the same way as ordinary or transient objects. The E language provides a set of macros that implement dierent forms of explicit iteration on collections. A brief overview of the main ideas of the E language is given in the following paragraphs. A more detailed description of the E language can be found in Exodus documents [23] and in [61]. The language E allows the de nition of persistent objects. Persistence is provided by the use of database types and persistent storage class. Any type de nable in C++ can be analogously de ned as a database type (db type). In this way, we have primitive db types: dbint, dbchar, etc. and db type constructors: dbclass, dbstruct, dbunion and dbpointer. To 75

allow the de nition of persistent objects, E introduces the keyword persistent, which has to precede the de nition of the persistent data structure. The following simple program counts the number of times it has been run. persistent dbint count; main(){ printf("Program has been run %d times",count++); }

The primary E tools for organizing data in a database are collections, which are incorporated into the language by the class template collection < T >. The lifetime of objects in a collection is determined by the lifetime of the collection. The same holds for tuple structured objects. Therefore, persistence is inherited to the components of structures. Any component must be, of course, de ned by a db type. Objects in collections are created and deleted using modi ed new and delete operators. Browsing the database can be realized by the use of dierent versions of the ITERATE macro. The parameters of ITERATE are: the iteration function and the user-de ned function. In general, any form of iteration macro picks up the elements provided by the iteration function and applies the speci ed user-de ned function to each element. The user-de ned function is speci ed by an ordinary C++ function. The iteration function is similar to an ordinary C++ function, except it includes one or more YIELD macros. The YIELD macro is de ned to identify the elements of the iteration. Its semantics are similar to the semantics of the C++ return statement, except it does not actually complete the computation of the iteration function. The execution of the function is interrupted by the YIELD macro and the value speci ed by YIELD is returned by the iteration function. On another call of the iteration function, the computation continues after the YIELD macro statement, until the iteration function actually terminates by executing a return statement or by reaching the end of the iteration function code.

5.3.3 Integrating OVAL and E DBPL The integration of the OVAL in the database programming language E is implemented by a preprocessor designed as an extension of the existing C++ parser cppp [26], written using the Yacc and Lex Unix tools. The currently implemented features of the query language are: support for class extensions and OVAL sets, the operation apply, which is used for evaluating class data members and member functions on the argument set of objects, and the operation select. These constructs exhibit the basic characteristics of the language. 76

OVAL sets are implemented using the class template setof < T >. The generic type T can be any legal E dbtype. The template class setof includes a de nition of a set of public functions that provide insertion, deletion and set manipulation. The setof class template also includes a de nition and partial realization of the select operation. The select function is a member function of the setof template class and realizes the iteration over the given set. The selection parameter expression is compiled to a separate function which returns a boolean value. This function is used as a parameter for the previously mentioned function select. Each E dbclass is extended by the concept of a class extension. The class extension is a set of object identi ers that represent instances of this class. The class extension must obey the following restrictions.

The class extension is de ned for dbclass-es. The class extension is a persistent set of dbpointer-s. The class extension includes only persistent instances of the dbclass. An instance object can be an element of the single class extension. The class extension can be accessed using the functions ext and exts. The expression person:ext, for example, denotes the extension of the class person. The function exts returns instances of a given class and all its subclasses. The expression that speci es access to the class extension is compiled to the name of the set which includes the class extension. The class extension can be used anywhere in place of the ordinary set of objects. As we have already said, the select operation is realized by the member function of the setof template and by a separate routine, where the selection expression is implemented. The select condition can use all previously de ned operations, including existential and universal quanti ers. Queries can be arbitrarily nested, as shown in the previous chapter. The following example presents the use of the select statement in the extended E database programming language. The select operation lters the set p by selecting from it the persons who are younger than the value of the variable age and do not work in Ljubljana. int age; setof p;

77

setof ps = p.select( this->age < ::age && !(this->works in institution.ext. select( this->addr = "Ljubljana"));

The variable this denotes an implicit variable which iterates through the select argument set. The variables de ned globally can be used in the selection. The variable name in the expression must be preceded by "::", to allow it to be distinguish from the attribute names.

5.4 Conclusions Two prototype implementations are presented in this chapter. The rst prototype is written in the programming language Prolog and serves as a tool for studying the properties of the OVAL data model and algebra. The prototype is designed to be exible enough to allow experimenting with the designed language. The type checking procedure is de ned in the form of a set of type checking rules. The procedure can derive the type of a query result statically in the presence of the substitutability principle and by respecting the constraints imposed by the data model. What remains to be done is the complete proof of the correctness of the type checking procedure. The second prototype is written as a preprocessor to the E database programming language, which itself provides constructs for representing and manipulating persistent objects. The work on this prototype addresses the impedance mismatch problem between a procedural programming language and a functional query language. Only a subset of the OVAL object algebra is currently implemented in the prototype. The implemented features are: support for class extensions, apply and select operations. The currently implemented constructs of the OVAL query language are similar to the query language of the ObjectStore object-oriented database system [57]. The ObjectStore query language is designed to t in with the C++ programming language. It includes support for ltering collections of object identi ers, the use methods, attributes and quanti ed variables. What remains to be done on this prototype implementation is the realization of the query optimizer and the implementation of the complete set of OVAL operations.

78

Chapter 6 Conclusions 6.1 Summary In this work we present the formalization of an object-oriented database model, an object algebra and a functional query language. The formalization shows that the object-oriented database can be perceived as a uniform set of objects. The class objects represent abstract concepts, while the instance objects represent concrete entities. The set of database objects is ordered by the isa relationship, which exhibits properties of specialization/generalization abstraction. Similarly, we de ne the partial ordering of structured values. This partial ordering subsumes the usual subtype partial ordering of types as de ned by Cardelli in [18]. The substitutability principle is presented and some of its properties are described. The behavior of objects is integrated into the structural model by methods. Signatures are de ned to describe the interface of the methods. Like the presentation of objects and structured values, the signature interpretation and the partial ordering of signatures are de ned. Finally, some properties of behavioral inheritance are described in terms of overriding and rules for resolving multiple inheritance con ict. In the work on the presented object algebra, we addresses the problems of querying database conceptual schema and manipulating nested components of structured objects. The algebra consists of model-based operations and basic algebraic operations. The modelbased operations are intended to manipulate object properties, which are described using the constructs provided by the OVAL data model. The basic algebra operations are used for querying, restructuring and changing the contents of objects stored in a database. The algebra operations are designed to be easily understood and to cover the user's needs for 79

querying a database. The OVAL algebra operations are functions which can be combined by means of function composition and by using higher-order algebra operations. The obtained functional query language includes all abilities oered by the SQL based query languages [11, 29, 42, 30, 57]. In addition, we see the following advantages of the proposed language. First, the functional nature of the obtained query language ts in with the syntax and semantics of the database programming languages based on C++. Secondly, the use of function composition forces the programmer to design queries in a step-by-step manner. In this way, complex query can be de ned by a sequence of more simple queries or primitive operations. For these reasons, the OVAL query language turns to be appropriate for manipulating objects with complex composition and classi cation structure. The original features of the proposed algebra are: the operation apply to intended for querying object components, and the set of operations for querying database conceptual schema. The apply at operation is the generalization of the well-known operation ApplyToAll [10]. The apply at operation has two parameters: the path expression and the query. The path expression identi es nested components where the parameter query is evaluated. Its semantics is simple, hence it is appropriate as a construct of a declarative query language. Furthermore, some results show that apply at can be utilized to manipulate more complex data structures (e.g., trees and graphs) by extending the expressive power of the parameter path expression. The use of schema information for querying a database was studied. We observe that by using an object-oriented database model, some properties of objects are represented by the database conceptual schema. In this way, the classi cation of objects and some aspects of the object composition are represented in an object-oriented database. The proposed modelbased operations provide the means for relating data objects to the conceptual schema and browsing conceptual schema. Firstly, two types of extension operations can be used to obtain the ordinary or inherited interpretation of classes. Secondly, the valuation operation is used to obtain properties of the class or instance objects. Next, the partial ordering of objects describes object properties which pertain to their classi cation. The set of comparison operations is de ned to allow us to relate objects to the partially ordered set of objects and o-values. Functionality similar to that of the comparison operations can be achieved by the use of closure operations, which are de ned on isa poset. Finally, the lub-set and glb-set operations are de ned to allow us to obtain the common properties of the set of objects. Two prototypes of the OVAL query language are implemented. First, the OVAL query 80

language is implemented using Sicstus Prolog [73]. The prototype serves for studying the semantics of operations, the use of the substitutability principle in query languages, and the type checking of OVAL queries. The type checking algorithm is given by a set of rules describing the derivation of the type resulting from each operation application. The second prototype implementation is used to demonstrate the suitability of the OVAL query language for integration with a database programming language based on C++. The prototype was implemented as an extension of the E database programming language [23]. The functional nature of the OVAL query language makes it suitable for integration with procedural database programming languages. Some implementation details and example queries are presented.

6.2 Contributions The object-oriented data model formalization, the object algebra and the functional query language OVAL are proposed in this thesis. The work includes the following contributions:

the object data model formalization { the formalization of the structural part of the object-oriented database model, which uni es the schema and the instance levels of the object-oriented database (Section 3.2),

the object algebra { the operations for querying conceptual schema (Section 4.3) the use of comparison operations based on o-value poset and the use of lub-set and glb-set operations for querying purpose (Subsections 4.3.3 and 3.3.5), the de nition of two types of the value equality operations: class based equality and deep equality (Subsection 4.3.6), { the de nition of the operation apply at intended for querying nested components of composite objects (Subsection 4.4.8),

extending database programming language with declarative querying facilities { the design and the implementation of the static type checking procedure for OVAL queries in the presence of the substitutability principle (Subsection 5.2.2) and 81

{ a proposal and implementation of the integration of the functional query language

with a procedural database programming language based on C++ (Section 5.3).

6.3 Further work There are a number of research directions which would enhance the work presented in this thesis. In this section, we overview some of the possible extensions of our work on objectoriented database model formalization and object algebra, as well as their implementation. Our work on the database model formalization can be continued by studying the following problems. First, there is still a gap between the schema and instance levels of the database: the data model prevents the de nition of o-values composed of instance objects and class objects. By removing this restriction, the instance of the class person, for example, could include a reference to the subclass of the class job, indicating the type of her/his job, (e.g. lawyer). The particular instance of the class person would have the value [: : :; name : "tone"; job : lawyer; : : :], for example. The consequences of this decision for the formal model as well as for the data model's expressive power should be studied. Similar ideas appear in logic-based declarative languages [43]. Secondly, we consider that the set-valued objects are the main obstacle to further definition of a more clear formalization, since there are no convincing mechanisms, to our knowledge, that would enable clear integration of set-valued objects and the inheritance principle. Let us illustrate this problem by an example. Suppose that the type of the class C is T (C ) = fC1g. It is not clear, in general, what the structure of subclasses of the class C is, since they can capture the set-structured and tuple-structured types. The work which remains to be done on the OVAL algebra is as follows. Firstly, the algebraic properties of operations should be studied. They can be utilized for query optimization. The set of OVAL operations overlaps signi cantly with the EQUAL query algebra [54] operations. In [54], Mitchell suggests an architecture for an extensible query optimizer based on the EQUAL query algebra. Similarly, the work on Aqua algebra [50], which is in turn drawn from the EXTRA [82] and EQUAL [54] algebras, is intended to serve as a basis for query optimizers. The main aim of the Aqua algebra is to support a wide range of algebraic operations de ned to manipulate objects. The work on EQUAL extensible query optimizer and on Aqua query optimizer can be used as a basic platform for work on the optimization of OVAL queries. 82

Secondly, the expressive power of the OVAL object algebra can be extended with additional abilities to manipulate recursive data structures such as trees and graphs. We feel that this can be achieved by extending the expressive power of the apply at parameter path expressions. Recent results show that least xpoint queries, which can be expressed by the OVAL close operation, can be easily expressed by the apply at operation with enriched semantics of its parameter path expressions. Finally, it would be interesting to see which operations and language constructs have to be added to the OVAL query language to be able to express all computable functions.

83

Bibliography [1] S. Abiteboul, N. Bidoit, Non First Normal Form Relations: An Algebra Allowing Data Restructuring, Journal of Comp. and System Science 33, 361-393, 1986 [2] S. Abiteboul, R. Hull, IFO: A Formal Semantic Database Model, ACM Trans. Database Syst. 12, 4 (1987), 525-565 [3] S. Abiteboul, P.C. Kanellakis, Object Identity as Query Language Primitive, ACM SIGMOD 1989 [4] S. Abiteboul, V. Vianu, Datalog Extensions for Database Queries and Updates, Jurnal of Comp. and Sys. Science 43, 1991 [5] S. Abiteboul, C. Beeri, On the Power of the Languages For the Manipulation of Complex Objects, Verso Report No.4, INRIA, France, Dec. 1993 [6] A.V. Aho, R. Sethi, J.D. Ullman, Compilers, Addison-Wesley Publishing Company, 1987 [7] R. Agrawal, N.H. Gehani, ODE (Object Database and Environment): The Language and Data Model, ACM SIGMOD 1989 [8] J. Annevelik, Database Programming Languages: A Functional Approach, ACM SIGMOD, 1991 [9] M. Atkinson et al. The Object-Oriented Database Sys Manifesto, Proc. First Int'l Conf Deductive and Object-Oriented Databa Elsevier Science Publisher B. V., Amsterdam, 1989, pp. 40-57. [10] J. Backus, Can programming be liberated from the von Neumann style? A functional style and its algebra of programs, Commun. ACM, Vol.21, No.8, August 1978, pp. 613641 84

[11] F. Banchilion, S. Cluet, C. Deobel, A Query Language for the O2 Object-Oriented Database System, Proc. 2nd Workshop on Database Programming Languages, 1989 [12] D.S. Batory, T.Y. Leung, T.E. Wise, Implementation Concepts for an Extensible Data Model an Data Language, ACM TODS, Vol.13, No.3, September 1988, pp. 231-262 [13] J. Banerjee et al., Data Model Issuses for for Object-Oriented Applications, ACM TOIS, Vol.5, No.1, January 1987, pp. 3-26 [14] C. Beeri, A Formal Approach to Object-Oriented Databases, Data & Knowledge Eng. 5 (1990), pp. 353-382 [15] E. Bertino et al., Object-Oriented Query Languages: The Notion and Issues, IEEE TKDE, vol.4, No.3, June 1992 [16] P. Buneman, R.E. Frankel, FQL- A Functional Query Language, ACM SIGMOD, 1979 [17] P. Buneman, R.E. Frankel, R. Nikhl, An Implementation Technique for Database Query Languages, ACM TODS, Vol.7, No.2, June 1992, pp. 164-186 [18] L. Cardelli, A Semantic of Multiple Inheritance, Information and Computation, 76, 138-164, 1988 [19] M.J. Carey, D.J. DeWitt, S.L. Vandenberg, A Data Model and Query Language for EXODUS, ACM SIGMOD 1988 [20] S. Ceri et al., Algres: An Advanced Database System for Complex Applications, IEEE Software, July 1990 [21] E.F. Codd, A relational model of data for large shared data banks, Comm. ACM, 1970, 13, 377-387 [22] L.S. Colby, A Recursive Algebra and Query Optimization for Nested Relations, ACM SIGMOD, 1989 [23] An Introduction to GNU E, The E Reference Manual and The Design of the E Programming Language, Exodus Project Documents, University of Wisconsin-Madison, 1992 [24] Exodus Storage Manager, Exodus Project Documents, University of Wisconsin-Maidson 85

[25] S. Danforth, P. Valduriez, A FAD for Data Intensive Applications, IEEE Trans. on Know. and Data Eng., Vol.4, No.1, Feb. 1992 [26] T. Davis, cppp documentation, Brown University, 1993-94 [27] O.Deux et al., The Story of O2, IEEE TKDE, Vol.2, No.1, March 1990 [28] R. Fikes, T. Kehler, The Role of Frame-Based Representation in Reasoning, Comm. of ACM, Vol.28, No.9, Sept. 1985 [29] D.H. Fishman et al., Overview of the Iris DBMS, 10th chapter in the Object-Oriented Concepts, Databases and Applications, editor W.Kim [30] G. Gardarin, P. Valduriez, ESQL2 - extending SQL2 to support object-oriented and deductive databases, Rapports de Recherche No. 1648, Inria, 1992 [31] A. Gill, Applied Algebra for Computer Science, Prentince-Hall, 1976 [32] M.L. Ginsberg, Multi-valued logics, Readings in Non-Monotonic Reasoning, pages 251255, Morgan-Kaufmann, 198 [33] P. Gray, Logic, Algebra and Databases, Ellis Horwood Limited, 1984 [34] S. Grumbach, T. Milo, Towards Tracable Algebras for Bags, ACM SIGMOD, 1993 [35] D.V. Gucht, Multilevel Nested Relational Structures, Journal of Comp. and Sys. Sciences 36, 77-105, 1988 [36] M. Gyssens, D.V. Gucht, The powerset algebra as a result of adding programming constructs to the nested relational algebra, ACM SIGMOD, 1988 [37] M. Hammer, D. McLeod, Database Description with SDM: A Semantic Database Model, ACM Trans. Database Syst. 6, 3 (1981), 351-386 [38] R. Hull, R. King, Semantic Database Modeling: Survey, Applications, and Research Issues, ACM Computing Surveys, Vol.19, No.3, September 1987 [39] R. Hull, C.K. Yap, The Format Model: A Theory of Database Organization, Journal of the ACM 31, 3 (1984), 319-357 [40] V. Josifovski, Razsiritev jezika E za deklarativno povprasevanje, Diplomska naloga, 1994 86

[41] M. Kifer, G. Lausen, F-Logic: A Higher-Order Language for Reasoning about Objects, Inheritance, and Scheme, ACM SIGMOD 1989 [42] M. Kifer et al., Querying Object-Oriented Databases, ACM SIGMOD 1992 [43] M. Kifer, G. Lausen, J. Wu, Logical Foundations of Object-Oriented and Frame-Based Languages, Technical Report 93/06, Dept. of Computer Science, SUNY at Stony Brook [44] W. Kim, A Model of Queries for Object-Oriented Databases, Proc. of the 15th Conf. on VLDB, 1989 [45] H.F. Korth, A. Silberschatz, Database System Concepts, McGraw-Hill Book Company, 1986 [46] G.M. Kuper, M.Y. Vardi, A New Approach to Database Logic, ACM SIGMOD, 1984 [47] G.M. Kuper, M.Y. Vardi, The Logical Data Model, ACM TODS, Vol.18, No.3, Sept. 1993 [48] C. Lecluse, P. Richard, F. Velez, O2, an Object-Oriented Data Model, ACM SIGMOD 1988 [49] C. Lecluse, P. Richard, The O2 Database programming Language, Proc. of 15th Int. Conf. On Very Large Data Bases [50] T.W. Leung et al., The Aqua Data Model And Algebra, Technical Report No. CS-93-09, Brown University, March 1993 [51] L. Liu, Exploring Semantics in Aggregation Hierarchies for Object-Oriented Databases, IEEE Data Eng., 1992 [52] L. Liu, A formal approach to Structure, Algebra & Communications of Complex Objects, Ph.D. thesis, 1992 [53] M. Mannino, I.J. Choi, D.S. Batory, The Object-Oriented Functional Data Language, IEEE TOSE, Vol.16, No.11, Nov. 1990 [54] G.A. Mitchell, Extensible Query Processing in an Object-Oriented Database, Ph.D. thesis, Brown University, 1993 87

[55] S.B. Navathe, A. Cornelio, Modellig Physical Systems by Complex Structural Objects and Complex Functional Objects, Int. Conf. on Extending Database Technology 1990, published in Lecture notes in Computer Science 416 [56] H.R. Nielson, and F. Nielson, Semantics with applications, A formal introduction, John Willey & Sons Ltd., 1992 [57] J. Orenstein, S. Haradhvala, B. Margulies, D. Sakahara, Query Processing in the ObjectStore Database System, ACM SIGMOD 1992 [58] M.P. Papazoglou, Unraveling the Semantics of Conceptual Schemas, to appear in Comm. of ACM [59] A. Poulovassilis, P. King, Extending the Functional Data Model to Computational Completness Int. Conf. on Extending Database Technology 1990, published in Lecture notes in Computer Science 416 [60] N. Prijatelj, Matematicne strukture I, Mladinska knjiga, 1964 [61] J.E. Richardson, M.J. Carey, Programming Constructs for Database Systems Implementation in EXODUS, ACM SIGMOD, 1987 [62] M.A. Roth, H.F. Korth, A. Silberschatz, Extended algebra and calculus for non 1NF elational databases, ACM Trans. Database System 1988, Vol.13, No.4, 389-417 [63] I. Savnik, T. Mohoric, T. Dolenc, F. Novak, A Database Model for Design Data, Proceedings of COMPEURO-93, IEEE, May 1993, Paris [64] I. Savnik, Extending Database Programming Language by Ad hoc Query Facilities, Technical Report #12/93, November 1993, Queensland University of Technology [65] I. Savnik, T. Mohoric, T. Dolenc, F. Novak, Modeling dynamic dependencies by functional database model, Microprocessing and Microprogramming, 1993, Vol. 37, pp. 187190. [66] I. Savnik, T. Mohoric, V. Josifovski, Extending Database Programming Language with Declarative Querying Facilities, Microprocessing and Microprogramming, 1995, Vol. 40, pp. 905-908

88

[67] I. Savnik, R. Ceglar, F. Novak, T. Dolenc, Semantic Database Model for Geometric Design Data, 13. Int. Conference on Information Technology Interface, ITI'91, Cavtat, 1991 [68] I. Savnik, Model kompleksinh objektov, Technical Report, IJS DP-6535, 1992 [69] I. Savnik, The Query Language for Engineering Applications, 14. Int. Conference on Information Technology Interface, ITI'92, Pula, 1992 [70] G.M. Shaw, S.B. Zdonik, A Query Algebra for Object-Oriented Databases, Proc. of Data Eng., IEEE, 1990 [71] D.W. Shipman, The Functional Data Model and the Data Language DAPLEX, ACM TODS, Vol6, No.1, March 1981 [72] M.H. Scholl, H.-J. Schek, A Synthesis of Complex Objects and Object-Orientation, OO Databases: Analysis, Design & Construction (DS-4), R.A. Meersman, W. Kent, S. Khosla (Editors), Elsevier Science Publishers, 1991 IFIP [73] SICStus Prolog Users's Manual, Swedish Institute of Computer Science, October 1991 [74] J.M. Smith, D.C.P. Smith,Database abstractions: Aggregation, Comm. of the ACM, June 1977, Vol.20, No.6 [75] S.Y. Su, Modelling Integrated Manufacturing Data with SAM*, IEEE Computer, January 1986 [76] S.Y.W. Su, V. Krisnamurthy, H. Lam, An Object-oriented Semantic Model, Chapter 17 in Arti cial Inteligence: Manufacturing Theory and Practice, edited by S.T. Kumara et al., Industrial and Engineering Press, Norcross, GA, 1989 [77] S.Y. Su, M. Gou, H. Lam, Association Algebra: A Mathematical Foundation for ObjectOriented Databases, IEEE Trans. on Knowledge and Data Eng., Vol.5, No.5, Oct. 1993 [78] M. Stonebraker et al., Extending a Database System with Procedures, ACM TODS, Vol. 12, No. 2, Sept. 1987 [79] J. Tillquist, F.Y. Kuo, An approach to the recursive retrieval problem in the relational databases, Comm. of the ACM, Feb. 1989, Vol.32, No.2 89

[80] B. Vance, Towards and object-oriented query algebra, Tech. Report CS/E91-008, Dept. Comp. Science and Eng., Oregon Graduate Institute, Jan. 1991 [81] S.L. Vandenberg, D.J. DeWitt, Algebraic Support for Complex Objects with Arrays, Identity, and Inheritance, ACM SIGMOD 1991 [82] S.L. Vandenberg, Algebras for Object-Oriented Query Languages, Ph.D. thesis, Technical Report No. 1161, University of Wisconsin

90

Appendix A Razsirjeni povzetek Zmogljivosti nekaterih novejsih objektno orientiranih sistemov za upravljanje s podatkovnimi bazami (OO-SPUB) [27, 57, 19] pokrivajo zmogljivosti relacijskih SPUB. Ti sistemi omogocajo shranjevanje velike kolicine podatkov v distribuiranem racunalniskem okolju, vsebujejo poizvedovalni jezik, ki navadno pokriva zmogljivosti relacijskih poizvedovalnih jezikov SQL ali QUEL [45] in praviloma nudijo inacico programskega jezika za delo s podatkovnimi bazami, ki tesno povezuje visji programski jezik s podatkovno bazo. Objektno orientirani podatkovni model in jeziki za delo z OO-SUPB omogocajo ucinkovito delo s kompleksnimi objekti, ki so obicajni v razlicnih programskih okoljih, kot so na primer nacrtovalsko, industrijsko, pisarnisko in poslovno programsko okolje. Obstaja mnozica odprtih raziskovalnih problemov, ki se nanasajo na OO-SUPB. V nalogi se ukvarajamo z zasnovo algebre in deklarativnega jezika za delo z objekti ter s problemom formalizacije pripadajocega objektno orientiranega podatkovnega modela. Formalizacijo objektno orientiranega podatkovnega modela uporabljamo pri studiju lastnosti podatkovnega modela in pri zasnovi jezikov za delo z objekti. Kljub izdatnemu prizadevanju za formalizacijo objektno orientiranega podatkovnega modela [48, 3, 52, 43, 14, 39] ne obstaja primerna resitev, ki bi bila primerljiva npr. s formalno osnovo relacijskega modela. Tezave z oblikovanjem primerne formalne osnove za objektno orientiran podatkovni model izvirajo predvsem iz njegove kompleksnosti. Podatkovni model namrec vsebuje gradnike za opis staticne strukture objektov in gradnike za predstavitev obnasanja objektov; poleg tega je zasnovan na nacelih dedovanja, enkapsulacije in prevzemanja lastnosti. Medtem ko so osnovni gradniki objektno orientiranega podatkovnega modela ze de nirani in uveljavljeni [41], pa objektna algebra se ni konsolidirana. Tako se npr. novejse objektne algebre [5, 46, 82, 70, 52, 50, 77, 80] precej razlikujejo v pristopu in se vcasih bistveno razlikujejo v mnozici osnovnih operacij algebre. C e upostevamo, da so operacije de nirane 91

za okvir objektno orientiranega podatkovnega modela, ki je dan in vsem objektnim algebram skupen, je taksno stanje lahko posledica razlicnosti zacetnih osnov algeber; med te stejemo relacijsko algebro [21], algebro vgnezdenih relacij [1, 62] in funkcijske jezike [10]. Pri zasnovi objektne algebre in deklarativnega jezika za delo z objekti nas vodita naslednja cilja. Objektno algebro zelimo de nirati v skladu z objektno orientiranim podatkovnim modelom. Drugic, operacije nacrtovane objektne algebre uporabimo kot osnovo za funkcionalni povprasevalni jezik, ki naj bo primeren za zdruzitev z objektno orientiranim programskim jezikom za delo s podatkovnimi bazami. Zasnovani jezik imenujemo OVAL. Pri zasnovi objektne algebre se ukvarjamo z naslednjimi problemi: zasnova formalizacije objektno orientiranega podatkovnega modela, ki bo v pomoc pri de niciji objektne algebre, zasnova operacij za delo z vgnezdenimi komponentami objektov, uporaba konceptualne sheme pri poizvedovanju v podatkovni bazi in zasnova algoritma za staticno preverjanje tipov poizvedb jezika OVAL. Oglejmo si te probleme bolj podrobno.

S formalizacijo objektno orientiranega podatkovnega modela zelimo de nirati lastnosti

objektov in pripraviti osnove za de nicijo operacij objektne algebre. Kot bomo pokazali pozneje, uporabimo nekatere gradnike formalizacije pri de niciji operacij podatkovnega modela.

Za razliko od relacijske algebre in algebre vgnezdenih relacij mora objektna algebra vse-

bovati operacije za delo s komponentami objektov; ti so zgrajeni s poljubno uporabo gradnikov mnozica in n-terica. Trdimo, da obstojece objektne algebre ne nudijo primernih operacij za delo z vgnezdenimi komponentami objektov. Zato proucujemo alternativne operacije za delo z vgnezdenimi komponentami objektov.

Za razliko od relacijskega modela in modela vgnezdenih relacij nudi objektno orientiran

podatkovni model bogatejsi nabor gradnikov za opis konceptualne sheme podatkovne baze. Opazamo, da za razliko od relacijskega modela in modela vgnezdenih relacij z objektno orientiranim podatkovnim modelom lahko predstavimo nekatere lastnosti modeliranih objektov s konceptualno shemo (npr. klasi kacija objektov) in ne samo kot podatke, ki so direktno vezani na shranjene objekte. Jezik za delo s podatkovno bazo mora torej imeti moznost dela s konceptualno shemo.

Kot zadnje zelimo zasnovati algoritem za staticno preverjanje tipov poizvedb jezika OVAL. V nalogi preucimo moznosti de nicije taksnega algoritma pod pogojem, da v jeziku dovoljujemo uporabo principa zamenjave. 92

Podatkovni model Objektno orientiran podatkovni model opisemo na formalen nacin, s cimer pripravimo podlago za de nicijo operacij nad objekti. Predlagana formalizacija objektno orientiranega podatkovnega modela omogoca enovit opis konceptualne sheme in podatkovnih objektov v podatkovni bazi, kot je to omogoceno z okvirji (angl. frames) [28]. Enovita obravnava podatkovnih objektov in konceptualne sheme podatkovne baze poenostavi interpretacijo nekaterih kljucnih konceptov objektno orientiranega podatkovnega modela in omogoca zasnovo enostavnih operacij za delo s konceptualno shemo podatkovne baze.

Staticna struktura objektov Osnovna gradnika formalizacije podatkovnega modela sta objekt in vrednost. Vrednost je lahko enostavna ali strukturirana. Enostavne vrednosti so nizi, cela stevila itd. Strukturirane vrednosti so zgrajene s konstruktoma mnozica in n-terica. Na primer vrednost [ime : "Tone"; starost : 34; naslov : "Ljubljana"] opisuje nekatere lastnosti osebe z imenom Tone. Objekt je formalno predstavljen s parom < i; v >, kjer je i objektni identi kator in v predstavlja vrednost objekta. Objektni identi kator enolicno doloca objekt znotraj podatkovne baze. Vrednost objekta vsebuje lastnosti objekta. Objekte delimo na enostavne in strukturirane. Vrednost enostavnih objektov je enaka njihovem identi katorju. Enostavni objekti so npr. nizi, cela stevila itn. Strukturirani objekti predstavljajo konkretne entitete ali abstraktne koncepte. S strukturiranim objektom lahko na primer opisemo osebo z imenom Tone pa tudi abstraktni koncept oseba. Razred ima v modelu dvojno vlogo. Lahko ga obravnavamo kot navaden objekt ali kot kalup, s katerim lahko kreiramo primerke tega razreda [9]. Kot navaden objekt predstavlja razred neki koncept, ki je abstrakcija mnozice primerkov tega razreda. To mnozico objektov imenujemo interpretacija razreda. Bolj natancno, interpretacija razreda C je mnozica objektnih identi katorjev. Interpretacijo razreda C oznacimo z (C ). Presek interpretacij poljubnih dveh razlicnih razredov je prazna mnozica. Vsak objekt pripada torej natancno enemu razredu. Mnozica razredov, ki de nira konceptualno shemo podatkovne baze, je urejena z relacijo je podrazred, ki jo oznacimo z i . Ta je re eksivna, antisimetricna in tranzitivna, torej doloca delno urejenost med razredi. Tako urejeni mnozici razredov prikljucimo se njihove 93

primerke. Vsak primerek danega razreda je z razredom v relaciji i. Tako dobimo delno urejeno mnozico, ki jo sestavljajo vsi objekti dane podatkovne baze. Na podlagi te urejenosti de niramo se razsirjeno interpretacijo razreda. To je mnozica objektov, ki vsebuje primerke danega razreda in primerke vseh podrazredov tega razreda. Na primer razsirjena interpretacija razreda oseba vsebuje tudi primerke razredov student, usluzbenec itd. Razsirjeno interpretacijo razreda C oznacimo z (C ). Lastnosti mnozice objektov opisemo s tipom. Formalno je tip de niran s parom (S ; P ), kjer S opisuje strukturo mnozice objektov in P opise obnasanje mnozice objektov. Oglejmo si najprej opis strukture. Strukturni del tipa imenujmo strukturni tip. Spet locimo med enostavnimi in sestavljenimi strukturnimi tipi. Enostaven strukturni tip je dolocen z razredom. Na primer razred int doloca enostaven tip. Podobno razred oseba doloca enostaven tip, katerega primerki so objektni identi katorji, ki predstavljajo osebe. Sestavljeni tip je vrednost, ki je zgrajena s konstruktoma mnozica in/ali n-terica. Komponente sestavljenega strukturnega tipa so bolj preprosti strukturni tipi. Na primer vrednost [ime : string; starost : int; naslov : string; zaposlen : organizacija] lahko predstavlja lastnosti oseb. Strukturni tip tukaj sestavljajo enostavni tipi int, string in organizacija. Interpretacija strukturnega tipa je mnozica vrednosti, ki imajo strukturo doloceno z danim strukturnim tipom. Interpretacija enostavnega tipa je de nirana z razsirjeno interpretacijo razredov: (C ) = c (C ), kjer je C razred in c razsirjena interpretacija razreda. Interpretacija tipa, ki je strukturiran z gradnikom n-terica, je dolocena takole: ([a1 : S1; : : :; an : Sn]) = f[a1 : s1; : : :; an : sn]; si 2 (Si); 1 i ng. Interpretacija tipa, ki je strukturiran z gradnikom mnozica, je dolocena takole: (fS g) = fs; s (S )g. Z interpretacijo tipa torej de niramo vse mozne primerke danega strukturnega tipa. Taka de nicija interpretacije omogoca pri enostavnih objektih uporabo principa zamenjave: vsako komponento objekta ali spremenljivko, ki je enostavnega tipa, lahko nadomestimo s primerkom podrazreda razreda C . Poglejmo zdaj povezavo med razredi in strukturnimi tipi bolj natancno. Videli smo ze, da mnozica vseh tipov vsebuje mnozico razredov v neki podatkovni bazi. Po drugi strani vrednost objekta, ki predstavlja neki razred, opisuje vse strukturne lastnosti razreda in ustreza prej podani de niciji strukturnega tipa. Vrednost razreda je torej strukturni tip. Na primer objekt (oddelek; [ime : string; zaposleni : fosebag]) doloca razred oddelek, katerega strukturni tip je [ime : string; zaposleni : fosebag]. Podobno kot objekte oz. objektne identi katorje uredimo tudi vrednosti. Najprej de niramo urejenost tipov. Relacijo med tipi imenujemo je podtip in jo oznacimo z o. Eno94

stavni tipi so predstavljeni z razredi. Enostavni tip T1 je podtip enostavnega tipa T2, ce je T1 podrazred T2. Strukturni tip fS1g je podtip tipa fS2g ali fS1g o fS2g, ce je S1 o S2. Podobno velja [a1 : T1; : : :; ak : Tk ] o [a1 : S1; : : :; an : Sn], ce je k n in Ti o Si; 1 i n. Tako kot delno urejenost razredov lahko razsirimo tudi delno urejenost tipov: mnozici tipov dodamo se vse njihove primerke. Vsak primerek danega tipa je s tipom v relaciji o. Relacijo o imenujemo tudi relacija je bolj speci cen. Tako razsirjena de nicija delne urejenosti pokriva vse vrednosti, ki se pojavijo v dani podatkovni bazi. Enako kot pri razredih de niramo tudi pri tipih razsirjeno interpretacijo tipa z uporabo urejenosti o. Razsirjena interpretacija tipa vsebuje vse primerke tega tipa in primerke vseh njegovih podtipov, ki so de nirani v podatkovni bazi. Razsirjeno interpretacijo tipa T oznacimo kot (T ). Z de nicijo razsirjene interpretacije tipa smo postavili formalno osnovo za uporabo principa zamenjave za poljubne vrednosti: vsako komponento objekta ali vrednost spremenljivke poljubnega tipa T lahko zamenjamo s primerkom obstojecega podtipa tipa T .

Obnasanje objektov Obnasanje primerkov nekega razreda opisemo z mnozico metod. Metodo predstavimo s signaturo in implementacijo, se pravi z algoritmom, ki izracuna objekt iz mnozice parametrov. S signaturo opisemo osnovne lastnosti metode. Signatura je izraz m : c0 c1 : : : ck ! c, kjer predstavlja m ime metode, c0 pa oznacuje razred, v katerem je metoda de nirana; preostali tipi ci dolocajo tipe parametrov metode m in c doloca tip rezultata metode. Interpretacija signature je de nirana podobno kot interpretacija sestavljenega strukturnega tipa. Za predstavitev lastnosti signatur potrebujemo samo razsirjeno interpretacijo signature. Razsirjena interpretacija signature m : c0c1: : :ck ! c, ki jo oznacimo z (m : c0c1: : :ck ! c), je mnozica vseh parcialnih funkcij iz (c0) : : : (ck ) v (c). Iz de nicije razsirjene interpretacije signature izpeljemo nekaj lastnosti metod. Metodo s signaturo m : c0 c1 : : : ck ! c dedujejo vsi podrazredi razreda c0. Vrednost parametra, ki je tipa ci, lahko nadomesti primerek kateregakoli podtipa tipa ci. In se: rezultat metode je lahko primerek tipa c ali primerek kateregakoli podtipa tipa c. Podobno kot razrede in tipe lahko delno uredimo tudi signature tako, da za signaturi s1 in s2 velja s1 o s2 () (s1) (s2). Pri dedovanju metode se lahko zgodi, da ima neki razred de niranih vec metod z istim imenom. Taksne primere razresimo s pravilom preglasovanja, po katerem je iz mnozice 95

metod z istim imenom izbrana tista metoda, ki ima glede na dano delno urejenost signatur najmanjso signaturo. Pri veckratnem dedovanju se lahko zgodi, da taksna metoda ne obstaja. V takem primeru mora uporabnik eksplicitno de nirati, katero metodo zeli izvrsiti; to naredi tako, da speci cira razred, v katerem je ta metoda de nirana.

Objektna algebra in povprasevalni jezik Objektna algebra je zasnovana v skladu s predstavljeno formalizacijo objektno orientiranega podatkovnega modela. Operacije objektne algebre sluzijo kot osnova za de nicijo deklarativnega povprasevalnega jezika OVAL. Objektno orientiran podatkovni model nudi v primerjavi z relacijskim in z modelom vgnezdenih relacij bogatejsi nabor gradnikov za opis klasi kacije in sestave objektov. Zato smo pri zasnovi operacij za delo z objekti posvetili najvec pozornosti konstruktom za delo s konceptualno shemo in s komponentami objektov. Operacije predlagane objektne algebre in povprasevalnega jezika razdelimo na operacije podatkovnega modela in osnovne operacije algebre. Z operacijami podatkovnega modela poizvedujemo po tistih lastnostih objektov, ki so de nirane s konstrukti podatkovnega modela. Operacije nam omogocajo dostop do vrednosti podatkovnih objektov, primerjanje objektov in enostavno delo z elementi konceptualne sheme podatkovne baze; to so razredi, lastnosti razredov in relacije med razredi. Za delo s konceptualno shemo podatkovne baze vidimo dva razloga. Najprej, uporabnika cesto zanimajo povezave med podatkovnimi objekti in razredi kot tudi relacije med razredi. Na primer uporabnik zeli vedeti ali je objekt janez primerek razreda student. Ali pa ga na primer zanimajo vsi nadrazredi razreda student. Drugi razlog za delo s konceptualno shemo je moznost poizvedovanja po konceptualni shemi, kar omogoca uporabniku oblikovanje predstave o strukturi podatkovne baze. To nam v podatkovni bazi s kompleksno strukturo podatkov olajsa poizvedovanje in nam pomaga pri spreminjanju konceptualne sheme, npr. pri dodajanju novih razredov. Osnovne operacije objektne algebre nam omogocajo iskanje ter spreminjanje zgradbe in vsebine objektov, ki so shranjeni v podatkovni bazi. Osnovne operacije uporabljajo operacije podatkovnega modela kot orodje za poizvedovanje po lastnostih objektov. Vse osnovne operacije so funkcije, ki jih lahko sestavljamo v izraze z uporabo funkcijske kompozicije in funkcij visjega reda. Tako dobljeni izrazi tvorijo funkcijski povprasevalni jezik OVAL. Uporaba funkcijske kompozicije za tvorjenje poizvedb sili programerja k modularni de niciji poizvedb na kompleksnih objektih. Poizvedbo de niramo kot kompozicijo enostav96

nejsih poizvedb in tako razdelimo problem na enostavnejse podprobleme. Funkcijska narava povprasevalnega jezika OVAL se ujema s sintakso in semantiko programskih jezikov za delo s podatkovnimi bazami (PJPB), ki temeljijo na C++. OVAL lahko torej enostavno zdruzimo s PJPB, ki ga tako razsirimo s konstrukti za deklarativno izrazanje poizvedb. V nadaljevanju preglejmo operacije podatkovnega modela in osnovne operacije algebre.

Operacije podatkovnega modela Operacije podatkovnega modela so de nirane na osnovi gradnikov, ki smo jih uporabili pri formalizaciji podatkovnega modela. Preslikava med objektnimi identi katorji in strukturiranimi vrednostmi, ki opisujejo lastnosti objektov, je realizirana s funkcijo val. Funkcija val torej vsakemu objektnemu identi katorju priredi vrednost pripadajocega objekta. V primeru, da objektni identi kator predstavlja razred, je njegova vrednost strukturni tip, ki opisuje lastnosti tega razreda. Obicajna in razsirjena interpretacija razreda sta realizirani z funkcijama ext in exts. Funkcija ext za dani objektni identi kator, ki predstavlja razred, vrne mnozico objektnih identi katorjev, ki predstavljajo primerke danega razreda. Funkcija exts za dani razred vrne mnozico objektnih identi katorjev, ki predstavlja unijo interpretacij tega razreda in vseh njegovih podrazredov. Objekte oz. objektne identi katorje lahko med seboj primerjamo z operacijo i, ki de nira delno urejenost med objekti. Na primer izraz x i student doloca vse objekte, ki so bolj speci cni od objekta student. Ta mnozica objektov vsebuje tudi bolj speci cne razrede, kot je na primer doktorski student. Z relacijo i lahko torej izrazamo lastnosti objekta, ki se nanasajo na delno urejenost objektov podatkovne baze. Podobno lahko uprabljamo relacijo o, ki de nira delno urejenost med vrednostmi, za izrazanje lastnosti mnozice vrednosti. Izraz x o [ime : string; age : 34; zaposlen : institut] na primer doloca vse vrednosti, ki imajo de nirane atribute ime, starost in zaposlen. Izbrane vrednosti morajo imeti za vrednost komponente starost stevilo 34, zaposleni pa morajo biti v delovni organizaciji, ki je predstavljena z razredom institut ali bolj speci cnim objektom. Podobne relacije kot jih lahko opisemo z relacijo i, lahko izrazimo tudi z operacijami class of , subcl in supcl. Operacija class of priredi podatkovnem objektu njegov maticni razred. Operaciji subcl oz. supcl pa za dani razred vrneta mnozici vseh njegovih podrazredov oz. nadrazredov. Za dolocanje skupnih lastnosti mnozice objektov uporabljamo operaciji lub-set in glb-set. Operacija lub-set priredi dani vhodni mnozici objektov mnozico vseh tistih objektov, ki so 97

bolj splosni od objektov dane mnozice in so jim glede na dano delno urejenost objektov "najblizji". Podobno operacija glb-set vrne ob dani vhodni mnozici objektov mnozico bolj speci cnih objektov, ki so elementom vhodne mnozice "najbljizje" glede na delno urejenost objektov. Kot zadnje predstavimo operacije za primerjanje objektov. Objekta sta identicna, ce imata enaka objektna identi katorja. Objekta sta popolnoma enaka po vrednosti, ce se ujemata v vseh lastnostih. De niramo tudi enakost po lastnostih danega razreda. Objekta sta enaka glede na lastnosti razreda C , ce se ujemata v vseh lastnostih, ki so de nirane z razredom C .

Osnovne operacije algebre Osnovne operacije objektne algebre uporabljamo za poizvedovanje po podatkovni bazi. Operacije objektne algebre so: apply, select, union, difference, intersect, tuple, group, unnest in apply at. Vsaka operacija algebre je funkcija, katere argument je mnozica objektov. Rezultat operacije je vrednost, katere struktura je odvisna od posamezne operacije. Nekatere izmed operacij so funkcije visjega reda, katerih parametri so operacije ali poizvedbe. Operacije lahko sestavljamo v poizvedbe z uporabo funkcijske kompozicije in s funkcijami visjega reda. Poglejmo bolj podrobno vsako operacijo posebej.

Z operacijo apply lahko izvrsimo poljubno operacijo podatkovnega modela, izberemo

komponento n-terice ali pa izvrsimo poizvedbo na vsakem elementu vhodne mnozice. Zadnje nam omogoca dostop do poljubno vgnezdenih mnozic kompleksnega objekta.

Operacijo select uporabljamo za izbiranje objektov iz vhodne mnozice objektov. Para-

meter operacije select je izbirni izraz, ki se ovrednoti za vsak element vhodne mnozice. V izbirnem izrazu lahko uporabljamo vse prej nastete operacije podatkovnega modela in obicajne operacije za primerjanje enostavnih objektov, ki so uporabljene v poizvedovalnih jezikih zasnovanih na osnovi SQL [45].

Operacije union, difference in intersect so de nirane enako kot v relacijski algebri. Za primerjanje objektov pri ovrednotenju operacij uporabimo enakost po vrednosti.

Operacijo close uporabljamo za izracun tranzitivnega zaprtja mnozice objektov glede

na vrednosti funkcije, ki je dolocena kot parameter operacije close. Operacija close nam omogoca delo z rekurzivno de niranimi podatkovnimi strukturami, kot so na primer drevesa in usmerjeni gra . 98

Operacija tuple je posplositev relacijske operacije projekcija. Operacija ima mnozico

parametrov, kjer vsak parameter doloca vrednost ene komponente mnozice n-teric, ki so rezultat operacije tuple. Vsak parameter operacije tuple je de niran z imenom atributa in poizvedbo, ki se ovrednoti na vsakem elementu vhodne mnozice in vrne vrednost ustrezne komponente n-terice.

Operacija group sluzi za tvorjenje skupin objektov iz vhodne mnozice objektov. Sku-

pine se tvorijo glede na vrednost prvega parametra operacije group, ki se ovrednoti na vsakem elementu vhodne mnozice objektov. Rezultat operacije group je mnozica parov, kjer prva komponenta para vsebuje karakteristicno vrednost poizvedbe, ki je dolocena s prvim parametrom operacije. Druga komponenta vsakega para vsebuje mnozico objektov, ki so rezultat ovrednotenja drugega parametra na vseh elementih vhodne mnozice, ki imajo enako vrednost prvega parametra operacije.

Operacijo unnest uporabljamo za razgnezdenje poljubne komponente iz sestavljenih

objektov vhodne mnozice. Z operacijo unnest lahko tvorimo enostavno mnozico iz mnozice mnozic, zdruzimo n-terice z nadrejenimi n-tericami ali razgnezdimo mnozice objektov, ki so vrednosti komponent n-teric.

Kot zadnjo predstavimo se operacijo apply at, ki jo uporabljamo za ovrednotenje poljubne poizvedbe na komponentah sestavljenih objektov. Operacija ima dva parametra. Prvi parameter je tockovni izraz, s katerim identi ciramo poljubno komponento v objektih vhodne mnozice. Drugi parameter je poizvedba, ki se izvrsi na izbrani komponenti.

Implementacija Realizirana sta dva prototipa funkcijskega povprasevalnega jezika OVAL. Prvi prototip je bil realiziran s programskim jezikom Prolog in sluzi kot eksperimentalno okolje za studij prej opisanih lastnosti podatkovnega modela in jezika. Implementacija ni vezana na noben konkreten podatkovni model, kot je na primer vezan drugi prototip na podatkovni model programskega jezika C++. Prototip je tako zasnovan dovolj eksibilno, da omogoca eksperimentiranje z gradniki jezika. V okviru tega prototipa zasnujemo in implementiramo postopek za preverjanje tipov poizvedb. Algoritem doloci tip rezultata poizvedbe staticno ob uporabi principa zamenjave in ob upostevanju omejitev, ki jih narekuje podatkovni model. Drugi prototip realiziramo kot razsiritev programskega jezika za delo s podatkovnimi 99

bazami E [23], pri cemer razsirimo E s konstrukti za deklarativno poizvedovanje. Eden izmed osnovnih namenov prototipa je studij integracije poizvedovalnega jezika OVAL z objektno orientiranim programskim jezikom za delo s podatkovnimi bazami. Funkcijska narava jezika OVAL omogoci tesno zdruzitev jezikov. Poizvedbe realiziramo z razsiritvijo sintakse in semantike tockovnih izrazov, ki jih v programskem jeziku C++ uporabljamo za dostop do komponent objekta in za aplikacijo metod na objektih. Prototip je realiziran s predprocesorjem, ki prevede razsiritve jezika E v stavke jezika E. Predprocesor je implementiran na osnovi programa za sintakticno preverjanje programskega jezika C++ z imenom cppp, ki je nastal na Univerzi Brown [26]. Trenutno realizirani konstrukti jezika OVAL so: razsiritev jezika E s konceptom ekstenzije razreda ter operaciji apply in select. Smatramo, da implementirani konstrukti odrazajo osnovne znacilnosti jezika OVAL.

Kljucne besede: podatkovne baze,

objektno orientirane podatkovne baze, modeli podatkovnih baz, konceptualni modeli, formalizacija podatkovnega modela, kompleksni objekti, algebre podatkovnih baz, objektna algebra, povprasevalni jeziki.

100

Appendix B Type checking rules The type of an o-value resulting from the evaluation of an OVAL query on a database is derived by a set of type checking rules. The semantics of rules presented in this section are as follows: the if part of the rule is evaluated, and if it succeeds, the sentence following then is returned as a resultant type. If the if expression does not succeed, the expression following else is returned. If the if condition does not succeed and the rule does not contain an else statement, the next rule describing the same operation is evaluated. If fail is reached, the type checking fails and the type checking routine simply stops.

Algebraic operations T [ Q1:apply(Q2)]] = if T [ Q1] = fT1g and T2 = T [ Q 2 ] then fT2g else fail T [ Q1:select(E )]] = if T [ Q1] = fT1g and T [ E ] = bool then fT1g else fail T [ Q1:close(Q2)]] = if T [ Q1] = fT1g and T [ Q2 ] = T 2 then flub(T1; T2)g else fail

101

T [ Q1:union(Q2)]] = if T [ Q1] = fT1g and T [ Q2] = fT2g then flub(T1; T2)g else fail

Rules for dier and intsc are the same.

T [ Q:tuple(A1 : Q1; :::; An : Qn)]] = if T [ Q] = T and T [ Qi] = Ti; 8i 2 [ 1::n] then [A1 : T1; :::; An : Tn ] else fail T [ Q:group(A : Q1; B : Q2)]] = if T [ Q] = fT g and T [ Q1] = T1 and T [ Q2 ] = T 2 then f[A : T1; B : fT2g]g else fail T [ Q:unnest] = if T [ Q] = ffT gg then fTg else fail T [ Q:unnest(Ak )]] = if T [ Q] = [A1 : T1; :::; Ak : Tk ; :::; An : Tn] and Tk = fTug; 1 k n then [A1 : T1; :::; Ak : Tu ; :::; An : Tn ] T [ Q:unnest(Ak )]] = if T [ Q] = [A1 : T1; :::; Ak : Tk ; :::; An : Tn] and Tk = [B1 : P1; ::; Bl : Pl] then [A1 : T1; :::; B1 : P1 ; :::; Bl : Pl ; :::; An : Tn ] else fail T [ Q:apply(null; Q1)]] = if T [ Q] = T and T [ id :: T:Q1] = T1 then T1 else fail

T [ Q:apply(Ak:P; Q)]] = if T [ Q] = T and T = [A1 : T1; :::; Ak : Tk ; :::; An : Tn] T [ id :: Tk :apply(P; Q)]] = Tp then [A1 : T1; :::; Ak : Tp; :::An : Tn ] else fail

102

T [ Q:apply(Ak:P; Q)]] = if T [ Q] = fT g and T = [A1 : T1; :::; Ak : Tk ; :::; An : Tn] T [ id :: Tk :apply(P; Q)]] = Tp then f[A1 : T1 ; :::; Ak : Tp; :::An : Tn ]g else fail

Data model operations T [ o] = o:type, T [ o:A] = if T [ o] = [:::; C1 :: A : T1; :::; Cl :: A : Tl; :::] and l 1 and 9Ci8Cj : (Ci; Cj 2 fC1; : : :; Clg ^ Ci i Cj ) then Ti else fail T [ o:Ck :: A] = if T [ o] = [:::; Ck :: A : T; :::] then T else fail

T [ o:val] = if T [ o] = C and C 2 OC then (C ) else fail

Expressions T [ E1 + E2] = if (T [ E1] = int and T [ E2] = int then int

Rules for '-' and '*' are the same.

T [ E1 + E2] = if (T [ E1] = real and T [ E2] = int) or (T [ E1] = T [ E2] = real) or (T [ E1] = int and T [ E2] = real) then real else fail

103

T [ E1=E2] = if T [ E1] = intjreal and T [ E1] = intjreal then real else fail T [ E1andE2] = if T [ E1] = bool and T [ E2] = bool then bool else fail T [ E1orE2] = if T [ E1] = bool and T [ E2] = bool then bool else fail T [ notE ] = if T [ E ] = bool then bool else fail T [ E1 < E2] = if T [ E1] = intjreal and T [ E2] = intjreal then bool else fail T [ E1 == E2] = if T [ E1] = C and T [ E2] = C and C 2 OC then bool else fail T [ O 2 S ] = if T [ O] = T and T [ S ] = fT1g and T o T1 then bool else fail T [ S 1 = S 2]] = if T [ S 1]] = T1 and Rule for operation '6=' is the same. T [ S 2]] = T2 and (T1 T2 or T2 T1) then bool else fail

104

Appendix C Kratek pojmovni slovar attribute { Z atributom opisemo neko lastnost objekta. [atribut] attribute overriding { Vrednost atributa, ki je de niran z nekim razredom, nadomesti

vrednosti istoimenskih atributov, ki so de nirani visje v hierarhiji razredov. [nadomestitev, preglasovanje atributa] behavioral inheritance { Podrazredi danega razreda dedujejo vse metode, ki so de nirane s tem razredom. [dedovanje strukture] class { Osnovni gradnik objektno orientiranega modela podatkovnih baz, ki sluzi kot abstraktna predstavitev mnozice podobnih objektov. [razred] class extension { Mnozica objektnih identi katorjev, ki predstavljajo primerke nekega razreda. [ekstenzija razreda] class interpretation { Mnozica primerkov danega razreda. [interpretacija razreda] class poset { Delno urejena mnozica razredov. [delno urejena mnozica razredov] classi cation { Abstrakcija, ki nam omogoca predstavitev mnozice objektov z abstraktnim konceptom. [klasi kacija] collection { Neurejena zbirka objektov, ki lahko vsebuje vec enakih objektov. [zbirka, kolekcija] complex object { Kompleksni objekt je opisan s podatkovno strukturo, zgrajeno s poljubno uporabo gradnikov mnozica in n-terica. Obnasanje kompleksnega objekta predstavimo z mnozico metod. Pojem kompleksni objekt se cesto uporablja za entitete, ki so predstavljene z objektno orientiranim podatkovnim modelom, semanticnimi podatkovnimi modeli [38] ali z modelom vgnezdenih relacij. [kompleksni objekt] conceptual schema { Logicni nacrt podatkovne baze, predstavljen z gradniki podatkovnega modela. V primeru objektno orientirane podatkovne baze predstavlja konceptualna shema 105

mnozico razredov, ki so urejeni v hierarhijo. [konceptualna shema] data member { Podatkovni clan razreda, ki sluzi za opis staticne lastnosti primerkov razreda. [podatkovni clan (razreda), atribut] data model { Podatkovni model je formalizem, ki ga sestavljata notacija za opisovanje podatkovne baze in mnozice operacij za delo podatkovno bazo. [podatkovni model] database { Zbirka podatkov, ki predstavlja model izbranega dela sveta. [podatkovna baza] database algebra { Algebra za delo z elementi podatkovne baze. [algebra podatkovne baze] database management system { Sistem, ki ga sestavljajo mnozica medsebojno povezanih podatkov in programi, ki omogocajo ucinkovito delo s temi podatki. [sistem za upravljanje s podatkovnimi bazami] database model { model podatkovne baze database programming language { Visji programski jezik, ki vsebuje konstrukte za delo z objekti v podatkovni bazi. [programski jezik za delo s podatkovnimi bazami] dereferencing operator { Operator, ki kazalec na objekt (objektni identi kator) preslika v stanje objekta. [operator dereferenciranja] encapsulation { Locevanje zunanjih lastnosti objekta, ki so dostopne ostalim objektov, od internih lastnosti objekta, ki so skrite drugim objektom. [ograjevanje, enkapsulacija] identity equality { Objekta sta identicna, ce jih identi ciramo z istim objektnim identi katorjem. [identicnost] inheritance { Lastnosti, ki so pripisane nekemu razredu se prenesejo na vse podrazrede tega razreda. [dedovanje] inheritance hierarchy { Hierarhija razredov, ki sluzi kot osnova za dedovanje lastnosti. [hierarhija dedovanja] interpretation { interpretacija, tolmacenje instance { Element mnozice objektov, ki jo predstavlja neki razred. Primerek nekega razreda. [primerek, instanca] member function { Konstrukt programskega jezika C++, s katerim opisemo obnasanje primerkov nekega razreda. [clanska funkcija (razreda)] method overriding { Metoda, de nirana v nekem razredu, nadomesti istoimenske metode, ki so de nirane visje v hierarhiji razredov. [nadomestitev, preglasovanje metode] multiple inheritance { Razred ima lahko vec kot en nadrazred. Razred torej lahko podeduje lastnosti vec nadrazredov. [veckratno dedovanje] nested relation model { Razsiritev relacijskega podatkovnega modela, ki dovoljuje, da je vrednost atributa relacije lahko relacija. [model vgnezdenih relacij, NF 2 model] 106

NF 2 model { model vgnezdenih relacij object { Osnovni gradnik objektnega podatkovnega modela, s katerim predstavimo strukturo in obnasanje neke realne stvari ali abstraktnega koncepta. [objekt, entiteta] object algebra { Algebra za delo z objekti, ki podpira osnovne gradnike objektno orientiranega podatkovnega modela. [objektna algebra] object state { Stanje objekta je podatkovna struktura, ki opisuje staticno strukturo objekta, lastnosti objekta in povezave objekta z ostalimi objekti podatkovne baze. [stanje objekta] object identi er { Identi kator, ki enolicno doloca objekt v podatkovni bazi. [objektni identi kator] path expression { Izraz, ki se zacne s spremenljivko in se nadaljuje s seznamom imen atributov, ki so med seboj locena s pikami. [tockovni izraz] persistent { trajen, persistenten, stalen persistent variable { Spremenljivka, katere vrednost in identiteta prezivita izvajanje programa, ki jo kreira. Spremenljivko lahko uporabimo v naslednjih izvajanjih istega ali tudi nekega drugega programa. [persistentna spremenljivka, trajna spremenljivka] persistent programming language { Visji programski jezik, ki vsebuje konstrukte za de nicijo persistentnih podatkovnih struktur, persistentnih spremenljivk in programske stavke za delo z njimi. [persistenten programski jezik] poset { delno urejena mnozica query { poizvedba, vprasanje query algebra { Algebra, katere operacije tvorijo jedro povprasevalnega jezika. [povprasevalna algebra, algebra povprasevalnega jezika] query language { Povprasevalni jezik za delo s podatkovnimi bazami. Povprasevalni jezik je praviloma deklarativen. [povprasevalni jezik, poizvedovalni jezik] relational algebra { relacijska algebra signature { Signatura je abstrakten opis metode. Sestavljena je iz mnozice imen tipov parametrov metode in imena tipa rezultata metode. [signatura] static type checking { Preverjanje ujemanja tipov v izrazih programskega jezika med prevajanjem programa v nizji programski jezik. [staticno preverjanje tipov] storage manager { Program, ki omogoca delo s podatki v podatkovni bazi. [shranjevalni upravljalnik] structural inheritance { Staticna struktura in lastnosti, ki so pripisane nekemu razredu, se prenesejo na vse podrazrede tega razreda. [dedovanje strukture] subclass { podrazred 107

subtype { podtip superclass { nadrazred supertype { nadtip substitutability principle { Spremenljivki tipa T lahko priredimo objekt tipa T ali prime-

rek podtipa tipa T . [princip zamenjave] type { Tip opise strukturo in obnasanje mnozice objektov. [tip] type checking { Preverjanje tipov v izrazih programskega jezika. [preverjanje tipov] type constructor { Gradnik, s katerim lahko opisemo strukturo objekta. [gradnik tipa, konstruktor tipa] type interpretation { Mnozica vseh primerkov nekega tipa. [interpretacija tipa] type lattice { mreza tipov value equality { Objekta sta enaka, ce imata enake lastnosti. [enakost, enakost po vrednosti]

108

Query Language for Complex Database Objects - CiteSeerX

Query Language for Complex Database Objects - CiteSeerX

Suggest Documents

Query Language for Complex Similarity Queries

Query Language for Prometheus - CiteSeerX

efficient retrieval of complex objects: query processing in a ... - CiteSeerX

Query operations for moving objects database systems

Learning Database Abstractions For Query Reformulation - CiteSeerX

Query Recommendations for Interactive Database ... - CiteSeerX

A Query Language for Multimedia Content - CiteSeerX

A Query Language for NC - CiteSeerX

A High Level Query Language for Pictorial Database Management

Safe Query Objects: Statically Typed Objects as Remotely ... - CiteSeerX

Approximate Query Processing for Database

Database semantics for natural language - CiteSeerX

Genetic Programming in Database Query Optimization - CiteSeerX

Discovering Complex Matchings across Web Query ... - CiteSeerX

Who Votes For What? A Visual Query Language for ... - CiteSeerX

LANGUAGE IDENTIFICATION IN COMPLEX ... - CiteSeerX

SINDBAD and SiQL: An Inductive Database and Query Language in ...

Interfacing HOL90 with a Functional Database Query Language

Conceptual Clustering of Complex Objects: A ... - CiteSeerX

A Genetic Algorithm for Database Query Optimization - CiteSeerX

Language Model Based Query Classification - CiteSeerX

Active Query Caching for Database Web Servers - CiteSeerX

Hibernate Query Language Hibernate Query Language and Native

Active Query Caching for Database Web Servers