Answering Queries Using OQL View Expressions - Semantic Scholar

6 downloads 0 Views 194KB Size Report
We propose a technique to answer OQL query expressions using OQL view expressions. The technique for reformula- tion is based on the extended ...
Answering Queries Using OQL View Expressions Louiqa Raschid

Daniela Florescu

University of Maryland, College Park, MD 20742 [email protected] http://www.umiacs.umd.edu/~ louiqa

INRIA, Rocquencourt, Le Chesnay, France [email protected] http://rodin.inria.fr

Patrick Valduriez

Abstract

INRIA, Rocquencourt, Le Chesnay, France [email protected] http://rodin.inria.fr

functions and queries, aggregation, arithmetic operators and type constructors are not easily represented in an extended Datalog. Finally, after back-translation from the logical representation, the equivalence of the queries in the object model must still be proved. In this paper, we propose an alternative approach to directly answer OQL queries using OQL view expressions. Since OQL is complex, there are many syntactically equivalent OQL expressions. Thus, we express queries and views in a canonical form, based on a strongly typed algebra. We then apply an algorithm FindSubquery, based on extended pattern-matching of well typed expressions, which determines if the OQL query can be rewritten, so that it uses the OQL view expression. If so, the query can be answered using the view. The research presented in this paper has the following features:

We propose a technique to answer OQL query expressions using OQL view expressions. The technique for reformulation is based on the extended pattern-matching of well typed expressions. It uses an algorithm FindSubquery, which determines if the OQL query can be rewritten so that it uses the OQL view expression. If so, the query can be answered using the view. Our research has the following features: we handle queries and views that are general OQL expressions; we handle template views (that are expressed over variables); we can generalize a view over the type hierarchy; we handle disjunctions in the view; and we use integrity constraints while matching the query and the view.

1 Introduction

There has been much research on answering queries using relational views. The issues are summarized in [Mumick95]. However, there has been little research on the object data model and query languages. The object model, e.g., ODMG, extends the relational model with features such as a type hierarchy, complex objects, operators, functions and methods. Object query languages, e.g., OQL [Cattell93], support welltyped expressions that are more general than SQL select-expressions. Interest in mediators and wrappers for heterogeneous information servers has increased the interest in answering queries using views for both object and relational databases [Qian96]. A translation approach to answer queries using views in the object model will translate the schema, views and query into some extended Datalog-like representation. Now, answering a query using views can be accomplished by applying the results from the relational model [Levy95a, Levy95b, Qian96]. Although conceptually elegant, this translation approach has several disadvantages. It requires translating into and out of Datalog. It is limited to those views that can be expressed in the representation. For example, nesting of structures,









 Research partially supported by the Advanced Research Project Agency under grant ARPA/ONR 92-J1929, the National Science Foundation under grant CDA9422138, and the Commission of European Communities under Esprit project IDEA.

1

The algorithm FindSubquery handles queries and view that are general OQL expressions, and we are therefore able to handle queries that use function, methods, quanti ers, etc.; Our algorithm has the same complexity for those queries (relational subset) that can be be handled by a translation into an extended Datalog representation; OQL view expressions can be constructed over free variables and they are similar to templates describing the capability of a wrapper [Rajaraman95]. Our algorithm can match a query against such a view and the query will provide bindings for the free variables in the view. This capability is useful for answering queries over heterogeneous information servers; Our algorithm utilizes the type hierarchy information of the object model. We nd a sucient condition so that an OQL view expression over the extent of a superclass can be used to answer an OQL query expression over the extent of a corresponding subclass;





De nition 2.1 An OQL expression over a set of variables X is recursively de ned as follows:

Our algorithm manipulates views that are OQL select-expressions whose where clause is a disjunction of conjuncts. We determine a sucient condition to use this view to answer an OQL query, when the query implies this view. Our algorithm determines a sucient condition to utilize key constraints, de ned for a class, to answer a query using a view, when the view does not include the instances of that class.

expr: const var lambda var f( [expr [; expr] ] ) expr:method name() expr:field name j

j

j

j

j

(constants) (variables from X ) (in select or quanti er) (function application) (method call) ( eld selection)

struct( eld name:expr [, eld name:expr] ) (tuple constructor)  j set( [expr [; expr] ] ) (set constructor)  j bag( [expr [; expr] ] ) (bag constructor)  j list( [expr [; expr] ] ) (list constructor) j select [distinct] expr from lambda var in expr [, lambda var in expr] [ where expr ] (selection) j exists lambda var in expr : expr (existential quanti er) j for all lambda var in expr : expr (universal quanti er) j

2 The data model and query language

2.1 The ODMG data model

The object data model of the ODMG standard [Cattell93] is based on atomic or structured types. The set of atomic types is the union of the set of prede ned types, such as integer, boolean, string, and the particular set of object types for an application. Type constructors are the set, bag, list and tuple. Type expressions are constructed from atomic types, through the recursive application of type constructors. An object interface speci es the properties (attributes and relationships) and operations or methods, that are characteristic of the instances of each object type. A relationship is a reference-valued attribute. The interface may also specify a key constraint for each object type, and inverse links (for a relationship). The object types are organized along a subtype hierarchy in the usual manner. The set of all instances of a given object type and its subtypes is its extension. The extension can be explicitly named in the object type interface, and can be automatically maintained. The set of operators includes built-in operators, userde ned functions and user-de ned methods. The builtin operators are comparison and arithmetic operators, aggregation operators (eg., count, min, max, sum, avg), set operators(eg., union, except, intersect, atten, element), list operators (eg., append, rst, last, nth), set membership operator (in). Special built-in operators are value constructors (eg., set, bag, list and tuple constructors), eld selection, quanti ers and select. An object database is accessed through the set of named variables which are the entry points of the database. The extensions of object types are particular named variables. A database interface consists of a set of object type interfaces, and a set of named variables (with their types).

De nition 2.2 An OQL query against a database interface is a well-typed OQL expression over the set of named variables of this interface. An OQL query is more general than a select-expression in SQL. However, the OQL select-expression is a built-in n-ary operator of particular importance, and we expect most of the queries to be expressed in this form. The expressions corresponding to each collection in the from, the predicate, and the projection of a select-expression, may all be general OQL expressions. As a consequence, OQL allows navigation (following object identi ers), nested selects, dependent joins, quanti ed predicates and user-de ned functions or methods to appear in all clauses of the select operator. A lambda variable is one that occurs in the from clause of a select expression or in a quanti ed expression (forall or exists) and its domain is a collection-valued expression. In this paper, we focus on queries and views which are OQL select-expressions.

3 Examples of answering queries using OQL view expressions The example schema

We use the well known example schema of [Cattell93] in Figure 1. Bold arcs represent the type hierarchy; directed arcs are relationships; bi-directional arcs are inverse links; double-headed arcs refer to set values. The extents of object types EMPLOYEE, PROFESSOR, STUDENT, etc., are employees, professors, students, etc. Examples of views and queries View V1 is a collection of type EMPLOYEE, for those employees whose salary is greater than 30K. View V2 constructs a

2.2 The OQL query language

The common query language used for expressing queries is OQL. Given an interface, OQL expressions are syntactically constructed in this interface by the recursive application of user-de ned and built-in operations, functions and methods, starting with constants and variables. Each OQL expression is well-typed. 2

structure with two elds; they are are the names and identi ers of full professors, and these elds comprise the key. View V3 constructs a structure of three elds, whose values are of type PROFESSOR, SECTION and STUDENT, respectively. V3 includes associate or full professors; the sections they teach; and students in those sections. View V4 is a collection of type PROFESSOR and is a template view, where the rank and salary of the selected professors are passed as parameters to the view. The query Q selects a structure of two elds; they are the names of full professors whose salary is 50K and who teach a section of a database course; and the names of the students in these sections. The views and queries are as follows:

z in y.is taken by and y.is section of=t

Examples of answering queries using views

We can use view V1 (employees whose salary is greater than 30K) to answer the query. The expression x in professors in Q and x1 in employees in V1 are matched, and produce the expression a in V1 in the from clause. We use the type hierarchy information to perform this match, and introduce an additional predicate a in professors, to ensure that Q selects those employees in V1 who are also professors. A sucient condition to perform this match is described later. The reformulated query is as follows, where the changes are underlined:

Q using V1:

V1: select x1 from x1 in employees where x1 .salary>=30,000 V2: select struct(prof name:x2 .name, prof id:x2 .id) from x2 in professors where x2 .rank=\full" V3: select struct(prof:x3 , sect:y3 , stud:z3 ) from x3 in professors, y3 in x3 .teaches, z3 in y3 .is taken by where x3 .rank=\full" or x3.rank=\associate" V4: select x4 from x4 in professors where x4 .rank=$1 and x4.salary=$2 Q: select struct(prof name:x.name, stud name:z.name ) from x in professors, y in x.teaches, z in y.is taken by where x.rank=\full" and x.salary=50,000 and y.is section of.name=\Database"

select struct(prof name:a.name, stud name:z.name ) from a in V1, y in sections, z in students, t in courses where a in professors and a.rank=full and a.salary=50,000 and t.name=\Database" and y in a.teaches and z in y.is taken by and y.is section of=t

View V2 selects names and identi ers of full professors; these elds are also the combined key for PROFESSOR. V2 can be used as a lter for Q. The expressions x in professors in Q and x2 in professors in V2 are matched. V2 projects a structure that does not include OIDs of professors. However, the instances of PROFESSOR must be accessed to obtain salary. Consequently we need to link the tuples of V2, and the instances of PROFESSOR, over the key, to use this view. Our algorithm determines a sucient condition, to obtain the following query:

We rst convert these views to the canonical form which is described later. Only view V3 and the query Q are changed. Note that we have simpli ed the from clause in both so that the domain of the lambda variables are collections corresponding to extents. We have moved the dependent joins in the from clause to the where clause. The navigation (y.is section of.name) in Q has been transformed. A new variable t ranging over the extent courses is introduced, and there is an explicit join between y.is section of and (t). The query and view in canonical form are as follows:

Q using V2:

select struct(prof name:x.name, stud name:z.name ) from a in V2, x in professors, y in sections, z in students, t in courses where x.salary=50,000 and t.name="Database" and y in x.teaches and z in y.is taken by and y.is section of=t and a.name=x.name and a.id=x.id

View V3 has instances of professor, student and section. Q can be computed using V3. However, V3 is a disjunctive view and the predicate x.rank=\full" in Q implies x3 .rank=\full" or x3.rank=\associate" of V3, if we substitute x/x3. Our algorithm nds a sucient condition to obtain the following query:

V3: select struct(prof:x3 , sect:y3 , stud:z3 ) from x3 in professors, y3 in sections, z3 in students where (x3 .rank=\full" or x3 .rank=\associate") and y3 in x3 .teaches and z3 in y3.is taken by Q: select struct(prof name:x.name, stud name:z.name ) from x in professors, y in sections, z in students, t in courses where x.rank=\full" and x.salary=50,000 and t.name=\Database" and y in x.teaches and

Q using V3:

select struct(prof name:a.prof.name, stud name:a.stud.name ) from a in V3, t in courses where a.prof.rank=full and a.prof.salary=50,000 and t.name=\Database" and

3

the built-in operators and user-de ned methods, (eg., the commutativity of the addition operator, the associativity of the intersection operator, the distributivity of the select over the union, the neutral element of the empty set for the union operator, etc.). Our solution is to nd a relaxed canonical form. This relaxed canonical form may not detect all syntactical dissimilarities, but will reduce the possibility of not identifying a logical equivalence. The e ectiveness of this relaxed canonical form would depend on the completeness of the compiler, with regard to all the possible algebraic properties of the operators and methods. We de ne a relaxed canonical form for an OQL select-expression satisfying the following properties: the predicate expression in the where clause is in conjunctive normal form; existential quanti ers in the predicate of the where clause are eliminated, whenever possible; there are no nested select-expressions in the from clause; particular cases of nested select-expressions occurring in the where clause are eliminated; examples are testing membership in the result of a nested selectexpression, or testing that the result of a nested select-expression is empty; dependencies between di erent collections (from clause) are eliminated, if possible; navigation within complex objects (the so-called functional joins) is transformed into explicit joins whenever possible. The criterion for elimination is that the corresponding class extent is in the interface. if it is possible to deduce, based on the key, that the result cannot contain duplicates, then the distinct clause is explicitly introduced in the selectexpression. The existence of duplicates is important when matching a query with a view. Duplicates are also important when eliminating existential quanti ers, or when manipulating expressions involving object references. The compiler uses a set of syntactic rewrite rules for each transformation. An OQL select-expression is converted to its canonical form by applying these syntactic rewrite rules, in any order, until saturation. The set of rules can be easily extended, as needed.

a.sect.is section of=t

The template view V4 has variables $1 for rank and $2 for salary. We match expressions x in professors in Q and x4 in professors in V4 and V4 is a direct subquery of Q, modulo the substitution  = $1/\full", $2/50,000 . Q can be rewritten using V4 as follows: f

g

Q using V4:

select struct(prof name:a.name, stud name:z.name ) from a in V4, y in sections, z in students, t in courses where t.name=\Database" and y in a.teaches and z in y.is taken by and y.is section of=t

4 Methodology for manipulating OQL expressions



There are two main steps in answering OQL query expressions using OQL view expressions. The rst step is syntactic query reformulation to convert OQL selectexpressions into a canonical form, based on a strongly typed algebra. This step is followed by reformulation. Procedure FindSubquery performs the reformulation to answer the query using the view. It is an extended pattern matching for well typed OQL expressions. We note that the choice of the most optimal reformulated query (in the canonical form representation), and the determination of an evaluation plan, is performed following classical optimization techniques.











4.1 Syntactic query rewriting

Two OQL expressions are logically equivalent1 if they evaluate to the same result, in all states of the database. However, they may be syntactically dissimilar. The generality of OQL queries increases the number of syntactically dissimilar ways in which they can be written. During query reformulation using pattern matching, it is important to identify logically equivalent select-expressions. Otherwise, a pattern matching procedure may determine that two expressions Q1 and Q2 do not match, although there may be an expression Q3 , which is logically equivalent to Q2 , and which therefore matches Q1. 2 Two select-expressions Q1 and Q1 are logically equivalent if and only if their canonical form representations are identical. The canonical form representation is based on a strongly typed object algebra, which is presented in [Florescu96b]. It is similar in spirit to the canonical form for nested SQL queries presented in [Kim82]. Specifying such a canonical form is an extremely hard problem. In order to solve all syntactical dissimilarities, one must integrate all the algebraic properties of



4.2 OQL Query rewriting using OQL view expressions

Given a query Q, and a stored view V de ned by the query Q0 , we want to determine if it is possible to rewrite the query Q as a nested query Q00, equivalent to Q, which contains the view, Q0, as a subexpression. If successful, the subquery Q0 of Q00 can be replaced by the view V , and the query Q00, equivalent to query

1 We presume that logical equivalence is undecidable, but we have not provided a proof in this paper. 2 Such problems do not usually arise with Datalog-like queries.

4

Q, can be computed using this view V . Given two select-expressions Q and Q0, procedure FindSubquery either fails, or if it succeeds, it returns a query Q00, logically equivalent to Q, and a substitution ; further, Q00 contains Q0 as a subexpression. A substitution for the set of variables X= v1 ; ; vn is a nite ordered set  of the form v1 =e1 ; ; vn =en , where each ei is an OQL expression distinct from vi , but with the same type as vi . Let  = v1 =e1 ; ; vn=en be a substitution, and E be an OQL expression. Consider the expressions E0; E1; ; En, where E0=E and Ei is obtained from Ei?1 by replacing each occurrence of variable vi in Ei?1 by ei . En is called the instance of E by the substitution , and it is denoted E. Procedure FindSubquery uses three algorithms Match(), Implies() and Included(). Match(): An OQL expression E1 matches another expression E2, if there exists a substitution, , such that E1 =E2 . Given two expressions, the procedure Match() either fails, or succeeds and returns a substitution , such that E1 =E2 . Match uses a classic pattern matching technique and it will identify a suf cient but not always the necessary conditions to obtain a substitution. It exploits typing of OQL expressions, and the algebraic properties of the operators, to increase the eciency of pattern matching. Match() utilizes the commutativity and associativity of the built-in operators. However, we may not guarantee all potential rewritings of the OQL select-expressions, to facilitate a match since to do so, all algebraic properties of userde ned functions and methods are needed. Included(): Given two set-valuated expressions C1 and C2 , (not necessarily of the same type), Included() will determine a sucient condition which assures that the values for C1 will always be included in the values for C2, independent of the particular state of the database. It will utilize static, semantic and type information. As an example of static information, the expression y:teaches, for y of type PROFESSOR is of type set(SECTION). Now, independent of the particular instances of y, y:teaches is always included in the extent sections of class SECTION. We also utilize type information. The class PROFESSOR is a sub-class of EMPLOYEE; consequently, the value of the (extent) expression professors is always included in the value of the (extent) expression employees. Implies(): Given two boolean expressions p1 and p2 , Implies() will determine a sucient condition so that the boolean value to which p1 evaluates will imply, (in the boolean algebra), the boolean value to which p2 evaluates, independent of the actual values. For example, the value of the expression x:rank = \full" implies the value of the expression x:rank = \full" orx:rank = \associate", where x is of type PROFESSOR. Boolean rules and semantic knowledge may be used f

f

f



g





here. We reiterate that these three algorithms, Match(), Implies() and Included(), use static and type information, and semantic knowledge, to determine sucient but not always necessary conditions for two OQL selectexpressions to match, for the values of one expression to be implied by another, or for the values of one expression to be included in another, respectively. The complete algorithm for Procedure FindSubquery is in [Florescu96a]. Here, we explain the functioning of the algorithm. Consider two input select expressions, Q0 and Q, as follows: Q0 = select proj1 from x11 in C11, , x1n in C1n where p11 and p12 and and p1q

g

g







Q= select proj2 from x21 in C21,   , x2m in C2m where p21 and p22 and    and p2r

FindSubquery works as follows. First, each query C1i corresponding to an input collection in Q0 is paired with some collection C2j in Q. The pairing is tested by Match() and/or Include(). If the pairing succeeds, the resulting substitution is added to the global substitution, together with the binding x1i/x2j of the corresponding variables. If for some collection of Q0 , none of the collections of Q are found to pair successfully, then the algorithm fails. However, some of the collections of Q may not successfully pair with any of the collections of Q0 . These collections, together with the corresponding variables, cannot be eliminated and must appear in the from clause of the nal query Q00 . Next, each conjunction p1i in the predicate of Q0 is paired with some conjunction p2j in Q. The pairing is tested by Match() and/or Implies(). If for some conjunction in the predicate of Q0 , none of the conjunctions of Q pair successfully, then the algorithm fails. However, some of the conjunctions of Q may not pair successfully with any of the conjunctions of Q0 . These conjunctions cannot be eliminated and must appear in the where clause of the nal query Q00 . The resulting query Q00 is produced from Q as follows. A new collection, corresponding to query Q0 , is added to the from clause of Q00. Some collections in the from clause of Q need not appear in Q00, since the corresponding conditions and projections are already included in Q0 , but some other collections cannot be eliminated and must appear in Q00. There are two criteria for a collection to appear in Q00: either the corresponding variable appears in some unmatched predicates or in the projection, or the corresponding variable appears in the expression of another collection which must appear in Q00 .3 Thus, some collections i

i

i

3

5

This is possible because of dependent joins.

may appear in both Q0 and also appear in the from clause of Q00 . In this case, these collections appear twice in the from clause of Q00, with two di erent variables ranging over them. Thus, an additional predicate must be added to the where clause of Q00 , in order to link these variables. This link, usually based on the key constraint, assures the equivalence of the two queries Q and Q00 ,

[Kim82] W. Kim, \On Optimizing an SQL-like Nested Query." ACM TODS, 7(3), pages 443-469, September 1982. [Levy95a] A.Y. Levy, D. Srivastava and T. Kirk, \Data Model and Query Evaluation in Global Information Systems." Int. Journal on Intelligent Inf. Systems, 1995.

5 Conclusions

[Levy95b] A.Y. Levy, A.O. Mendelzon, Y. Sagiv and D. Srivastava, \Answering Queries Using Views." Proc. of the ACM PODS Symp., 1995.

We have presented a technique for answering OQL query expressions using OQL view expressions. Reformulated OQL expressions in a relaxed canonical form are processed by a physical cost based optimizer which selects an optimal query and an evaluation plan. This has been implemented within the Flora compiler-optimizer prototype which supports ODMG/OQL. The prototype was successfully demonstrated at the EDBT 1996 conference, and a simpli ed version of the prototype can be accessed as http://rodin.inria.fr/demos/optimizer. Procedures Match(), Implies() and Included() of the algorithm FindSubquery are all limited by the relaxed canonical form. All the algebraic properties of the user-de ned operations and methods will be needed to reduce the number of equivalences that are missed. Further, the lack of integrity constraints may prevent the algorithm from identifying all logical equivalences. Suppose we consider any of the inverse links in our example schema, and suppose that the query used one link whereas the view was expressed using an inverse link. Unless we included an integrity constraint capturing the semantic information of the inverse link, algorithm FindSubquery would not be able to rewrite the query using the view, although the two subexpressions may be logically equivalent.

[Mumick95] I.S. Mumick, \The Rejuvenation of Materialized Views." Presented at CISMOD 95. [Qian96] X. Qian, \Query Folding." Intl. Conf. on Data Engg., 1996. [Rajaraman95] A. Rajaraman, Y. Sagiv and J.D. Ullman, \Answering Queries Using Templates." Proc. of the ACM PODS Symp., 1995.

References [Cattell93] R.G.G. Cattell et al ., The Object Database Standard - ODMG 93. Morgan Kaufmann, 1993. [Florescu94] D. Florescu and P. Valduriez, \Rule-based Query Processing in the IDEA System." Int. Symp. on Adv. Database Tech. and Integration, Nara, Japan, October 1994. [Florescu96a] D. Florescu and L. Raschid and P. Valduriez, \A Methodology for Query Reformulation in CIS Using Semantic Knowledge." Journal of Coop. Information Systems, to appear, 1996. [Florescu96b] D. Florescu, Ph.D. dissertation, in preparation, INRIA, 1996.

6

COURSE

has_prerequisites

name number is_prerequisite_for has_sections

STUDENT

takes

EMPLOYEE id salary

student _id

is_section_of

is_taken_by

SECTION

TA

PROFESSOR

assists

number

rank

has_TA is_taught_by

teaches

Figure 1: Example object schema

7

Suggest Documents