Recursive Queries Using Object Relational Mapping - Semantic Scholar

6 downloads 5433 Views 123KB Size Report
In this paper we propose extending existing ORMs with recursive CTE's support. ... new standards like object algebra, nothing suggests that leading role of rela-.
Recursive Queries Using Object Relational Mapping Marta Burza´ nska1, Krzysztof Stencel1,2 , Patrycja Suchomska1 , Aneta Szumowska1 , and Piotr Wi´sniewski1 1

2

Faculty of Mathematics and Computer Science, Nicolaus Copernicus University, Toru´ n, Poland Institute of Informatics, University of Warsaw, Warsaw, Poland {quintria,stencel,papi,iriz,pikonrad}@mat.umk.pl

Abstract. Recent years witnessed continuous development of database query languages and object relational mappers. One of the research fields of interest are recursive queries. The first implementations of such queries for SQL has been introduced by Oracle in 1985. However, it was the introduction of recursive Common Table Expressions into the SQL:99 standard that made the research on this topic more popular. Currently most of the popular DBMS implements recursive queries, but there are no object relational mappers that support such queries. In this paper we propose extending existing ORMs with recursive CTE’s support. A prototype of such an extension has been implemented in SQLObject mapper for the Python language. Tests have been conducted with PostgreSQL 8.4 database. Furthermore, recursive queries written using CTEs amount to be too complex and hard to comprehend. Our proposal overcomes this problem by pushing the formulation of recursive queries to a higher abstraction level, which makes them significantly simpler to write and to read.

1

Introduction

Since the 1970’s Codd’s publication [11], the relational data model has become a main standard for permanent data storage. Despite many attempts to introduce new standards like object algebra, nothing suggests that leading role of relational database systems is at risk. Modern relational DBMSs offer extensions to the relational model that widely extend their functionality. The research on such extensions has been greatly influenced by the need to express special data like bill-of-material, corporate hierarchy or graph structures. Querying of those structures is especially interesting for the authors of this paper. Recent years have shown an increase in research on the recursive query processing. The paper [1] discusses current State-of-the-Art in the field of SQL recursive Common Table Expressions. Most of the research around the recursive queries is focused on efficiency. The papers [7,2,8] discuss different aspects of optimisation - from query rewriting, through special on-the-fly placement of indexes during execution, to modifiable execution plans. T.-h. Kim et al. (Eds.): FGIT 2010, LNCS 6485, pp. 42–50, 2010. c Springer-Verlag Berlin Heidelberg 2010 

Recursive Queries Using Object Relational Mapping

43

The SQL’s recursive queries, despite their expressiveness, suffer from a big disadvantage - their construction is often too complicated for an average database user. Also, if a developer uses an ORM in order to benefit from recursive queries they have to be written in pure SQL by hand and explicitly passed to the database. The contribution of this paper is thus twofold. First, we propose a solution to the problem of recursive querying in object-relational mappers. Second, we combine the recursive queries with the simplicity of programming in Python. User is only required to provide some basic information and the tool provided by the authors will generate a proper query and pass it for execution. The resulting query formulation is significantly more comprehendible than written directly in SQL. Further reasons of considering these queries and method of their implementation in example systems is being covered in Section 2. Nowadays, the market of database applications has been dominated by object languages and relational DBMS used as a back-end. Those two different approaches combined result in different problems called the object-relational impedance mismatch. One attempt to solve those problems are object-relational mapping tools [9,10]. They will be discussed in section 3. In Section 4 we remind basic information on the technologies employed: Python and SQLObject. Section 5 presents the proposed solution to the problem of recursive queries in object-relational mappers. Section 6 concludes.

2

Two Motivating Examples

Let us consider the following situation: A standard example of a relation describing hierarchical structure is a corporate hierarchy representation. This relation joins employees with their supervisors. Let us assume we have a following relation empl: Example 1. Relation with employees id | first_name | last_name | boss_id ----+------------+-----------+---------1 | John | Travolta | 2 | Bruce | Willis | 1 3 | Marilyn | Monroe | 4 | Angelina | Jolie | 3 5 | Brad | Pitt | 4 6 | Hugh | Grant | 4 7 | Colin | Firth | 3 8 | Keira | Knightley | 6 9 | Sean | Connery | 2 10 | Pierce | Brosnan | 3 11 | Roger | Moore | 9 12 | Timothy | Dalton | 9 13 | Daniel | Craig | 1

44

M. Burza´ nska et al.

14 | George 15 | Gerard

| Lazenby | Butler

| |

5 6

We would like to obtain information about all Travolta’s direct and indirect subordinates. To acquire such data first we ought to find Travolta in the database. Then, we should search for all his subordinates, next their subordinates, etc. Using basic SQL:92 constructions such query would take the form of a union of series of queries: Select id, first_name, last_name, boss_id From empl WHERE last_name = ’Travolta’ UNION Select e.id, e.first_name, e.last_name, e.boss_id From empl e JOIN empl c ON (e.boss_id = c.boss_id) WHERE c.last_name = ’Travolta’ UNION Select e.id, e.first_name, e.last_name, e.boss_id From empl e JOIN empl c_l1 ON (e.boss_id = c_l1.boss_id) JOIN empl c ON (c_l1.boss_id = c.boss_id) WHERE c.last_name = ’Travolta’ UNION ... Such construction has an obvious drawback - we have to know the depth of this hierarchy. Without this knowledge it is an impossible task. Fortunately the SQL:99 standard provides means of expressing the request for subordinates without this limitation. Those means are recursive views and recursive Common Table Expressions. The following example presents a query searching for Travolta’s subordinates with the help of recursive CTE: Example 2. Information about Travolta and all his direct and indirect subordinates. WITH RECURSIVE rec_empl AS ( SELECT id, first_name, last_name, boss_id FROM empl WHERE last_name = ’Travolta’ UNION SELECT e.id, e.first_name, e.last_name, e.boss_id From empl e, rec_empl r WHERE e.boss_id = r.id ) SELECT * FROM rec_empl Example 1 presents hierarchical data which does not contain cycles. Another example is going to consider a situation including natural cycles. Let us now consider the following relation describing connections between the cities:

Recursive Queries Using Object Relational Mapping

45

id | city_start | city_end | travel_time ----+------------+----------+------------1 | Torun | Warsaw | 2.45 2 | Warsaw | Krakow | 4.3 3 | Torun | Gdansk | 2.2 4 | Torun | Olsztyn | 1.8 5 | Warsaw | Kutno | 1.3 6 | Kutno | Warsaw | 1.3 7 | Olsztyn | Torun | 1.8 8 | Gdansk | Warsaw | 4 9 | Kutno | Torun | 1.2 10 | Lodz | Krakow | 3.5 For these structures it is natural to ask about connections between two cities placing a restriction on a number of transport changes. For example, we may ask about a connection from Torun to Zakopane with three transport changes at most. Unlike the previous example, this query may be expressed both using recursive CTE and a union of three queries. The union construction is fairly simple but spacious, thus it will be omitted in this paper. The following query presents the described connections request using a recursive CTE: Example 3. Connections from Torun to Zakopane with a list of transport change cities and total time of travel (excluding time of waiting). with recursive rcte as ( SELECT conns.id, conns.city_end, conns.travel_time, conns.city_start, ’ ’ || text ( conns.city_end ) as concat_city_end, ’ ’ || text ( conns.id ) as concat_id, ’ ’ || text ( conns.city_start ) as concat_city_start, conns.city_start as constant_city_start, conns.travel_time as sum_of_travel_time, 1 as level FROM conns WHERE conns.city_start=’Torun’ UNION SELECT conns.id, conns.city_end, conns.travel_time, conns.city_start, text ( rcte.concat_city_end ||’, ’|| conns.city_end ) as concat_city_end, text ( rcte.concat_id ||’, ’|| conns.id ) as concat_id, text ( rcte.concat_city_start ||’, ’|| conns.city_start ) as concat_city_start, rcte.constant_city_start, rcte.sum_of_travel_time + conns.travel_time as sum_of_travel_time, rcte.level + 1 as level FROM conns, rcte WHERE rcte.city_end = conns.city_start AND rcte.level < 4 AND rcte.concat_city_start NOT LIKE ’% ’||conns.city_start||’%’) select * from rcte WHERE rcte.city_end=’Zakopane’

46

M. Burza´ nska et al.

The examples given above present recursive queries expressed using PostgreSQL dialect [6]. For other database systems the notation may be slightly different [1]. We may notice from Example 3 that the recursive SQL queries may easily become difficult for a developer to understand and maintain. Also, writing such complicated queries in different application is a major inconvenience. This is why we propose a solution that allows the developer to focus on the essence of the queried data, while leaving the task of generating an appropriate query to the system.

3

Object-Relational Mappers

Modern database applications are mostly implemented using object-oriented languages. At the same time, the back-end for such applications are relational database systems. This creates a problem of translating information stored in objects into relational tuples that can be stored inside a relational database. This problem, more generally known as the impedance mismatch, has been thoroughly analysed for different approaches [9,10] and will not be discussed in this paper. Instead, we shall focus on one of its aspects - a method of expressing SQL queries inside the language used to build the application. This issue becomes even more troublesome when we permit connections to various databases using various SQL dialects. The tool used to bridge the gap between relational SQL and object-oriented programming language should be aware of differences among dialects, yet they should be irrelevant to the application developer using such tool. For example, there are many different JDBC drivers supporting various DBMS. Because of the variety of dialects, a JDBC driver should implement a method ”jdbcCompliant()” [3] that tests the driver for compliance with JDBC API and SQL-92 standard. This means that each driver ought to restrict the usage of SQL constructions to only those supported by SQL-92 standard. Bearing in mind the rapid development of databases and programming languages in the last 15 years, this approach should be updated and SQL-92 should be replaced in this matter with a newer standard. The ORM developers decided on a different approach. SQL queries are automatically generated according to the specific dialect’s syntax dependent on the chosen DBMS. The mapper should have a set of rules describing how classes should correspond to relations, fields to relevant attributes, instances to tuples and references to foreign keys. Each supported SQL dialect should be described this way. The end application developer does not need to be aware of those rules and the data manipulation mechanisms used by the developer should be independent on the choice of the DBMS. Such solutions are possible for general application because every modern relational DBMS is based on the same data model and data access mechanism. Differences between DBMSs include their approach to specific optimisation methods, scalability and supported extensions. However, these aspects are not important in context of this paper.

Recursive Queries Using Object Relational Mapping

4

47

SQLObject and Python

One of the choices to be made by this paper’s authors was to select a base language for the experimental implementations. The choice was made based on the availability of ORMs, firm design and popularity of a language. Having considered many different languages the authors decided to use Python language. The main reasons behind this choice are the exceptional ease of prototype development and access to the source codes for ORM implementations. Also, Python’s biggest drawback - its performance - has no significant impact on this work. While choosing the base language the authors also have considered different ORMs. The prototype implementation could have resulted in alteration of the main ORM source code, thus the authors have decided to focus on open-source mappers. Among the Python’s object relational mappers the SQLObject ORM has been chosen as the most developer-friendly. We will now briefly describe this ORM. In SQLObject the configuration for specific data is set up using inheritance from a special class simply called the SQLOBject. Example 4 presents a configuration of an SQLObject mapping. Example 4. Classes for empl, and conns tables will be formed respectively: class Conns(SQLObject): cityStart = StringCol() cityEnd = StringCol() travelTime = FloatCol() class Empl(SQLObject): firstName = StringCol() lastName = StringCol() boss = ForeignKey(’Empl’, default = None) Example 5. Employees with their last name ”Craig” and the last names of their bosses: craigs = Empl.select(Empl.q.lastName == "Craig") for e in craigs: print e.boss.lastName Example 5 uses so called q-magic. It is an SQLObject’s technique that allows for defining search conditions in Python language. Such condition is automatically mapped to SQL when needed. The Empl.q.lastName object is a sub-object of Empl class. It represents the information about the last name attribute of the empl database relation. More detailed information about the q-magic technique can be found at [5].

5

Recursive Query in SQLObject

The main idea of the proposed recursive query support is to extend the mechanisms provided by ORMs with additional methods for passing recursive queries.

48

M. Burza´ nska et al.

Those methods should be able to create an SQL query representing a recursive CTE in a specific dialect of the target database. From the users point of view they should represent a set of unmodifiable data - a recursive view. Having those considerations in mind the authors have decided to base their designs on the standard SQLObjectView class. Instances of this class can be queried, but cannot be altered. The proposed user interface class is ViewRecursive class. In order to create a recursive query, a developer should declare a new class inheriting from ViewRecursive and equip it with a set of class attributes described below: – tables = [] - a list of SQLObject classes representing tables that will be used to form the recursive query. – recursiveOn - a field of the base table used to join it with a CTE in a recursive step. – recursiveTo - a field of the CTE used to join in with the base table in a recursive step. – initialCondition = [] - a list of conditions that should be met by tuples from the initial SELECT subquery of the recursive query. – predicateList = [] - a list of conditions that should be met by resulting tuples (instances). The fields recursiveOn and recursiveTo form an equality predicate in a recursive part of the CTE’s definition. This predicate is the basis of the recursion. The predicateList represents predicates used in the outer SELECT query that uses the CTE. The following attributes are optional - they are used to provide additional functionality to the data search: – maxLevel - upper boundary on recursion depth. – summands = [] - a list of fields, which are going to be summed up. It should be noted here, that the mapper does not check type compatibility. – concats = [] - a list of fields, which results will be concatenated with a single whitespace character as a space between arguments. – constants = [] - a list of fields, which values are going to be repeated in all iterations. The maxLevel attribute is especially useful in case of searching through a cyclic structure. Lack of this attribute definition in such case may lead to infinite loop calculations. The usage of constants is convenient in situations when during the recursion steps the tree invariants are being created. T he tree invariant is an attribute value that once generated in an initial step is constant for all tuples generated out of an initial tuple. The existence of the tree invariant allows for application of additional optimisation transformations described in [2]. Examples 6 and 7 present the usage of this construction to express recursive queries described earlier.

Recursive Queries Using Object Relational Mapping

49

Example 6. A definition of a class corresponding to Example 2: class Subordinates(ViewRecursive): tables = [Empl] recursiveOn = Empl.q.boss recursiveTo = Empl initialConditions = [Empl.q.lastName == "Travolta"] This class has a method getRecursive inherited from ViewRecursive. This method generates a query presented in Example 2 using the information provided by the attributes of the class. Next, it creates read-only objects containing fields: id, first name, last name, boss id. The boss id field is a pointer to proper Empl class instance. The declaration of this class is simpler than the query from Example 2, but the gain is not too striking yet. Example 7. A definition of a class corresponding to Example 3. class Travel(ViewRecursive): tables = [Conns] recursiveOn = Conns.q.cityStart recursiveTo = Conns.q.cityEnd maxLevel = 4 summands = [Conns.q.travelTime] concats = [Conns.q.cityEnd, Conns] constants = [Conns.q.cityStart] initialConditions = [Conns.q.cityStart == "Torun"] predicateList = [Conns.q.cityEnd == "Zakopane"] Method getRecursive of the Travel class generates a query from Example 3. This query results in a collection of objects with the fields: conns, city end, travel time, city start, concat city end, concat id, concat city start, constant city start, sum of travel time, level. The constant city start is the tree invariant described above. Now, the simplicity of recursive queries written according to our proposal is apparent. It is enough to compare the complex code from Example 3 with easily comprehendalble class definition from Example 7. To acquire the requested data from the database the getRecursive method is called. This function first generates a recursive query, which is then passed to the database. The result of the query is used to create a collection of read-only objects representing the data. I n particular, calling Travel.getRecursive() will generate and execute query from Example 3. Although the Travel class instances are read-only, they contain pointers to fully modifiable objects. This correlation is presented by Example 8: Example 8. emps = Subordinates.getRecursive() for e in emps: if e.lastName == "Craig": e.boss.firstName = "Bruce"

50

M. Burza´ nska et al.

In this query the variable e representing Craig is read-only. However, it points to the Bruce Willis object, which is the Empl class instance. This class of objects allows for data modifications. The proposed solution significantly shortens the time required for preparing a recursive query. It also has a great impact on readability and maintainability of the code. A developer no longer needs to focus on the syntax and complexities of recursive SQL queries. Instead, he/she may concentrate on the pure data he/she wants to gather. Additional benefit of our method is that it would provide a cross-platform solutions with only small adjustments in configuration to fit specific dialects.

6

Conclusions and Future Work

In this paper we presented a proposition how to incorporate recursive queries into object-relational mappers. So far, such mappers do not facilitate recursive queries. Furthermore, the presented method allows expressing recursive queries in a noteworthy simpler way than it is now possible in SQL. Future research plans include developing prototypes for other databases, in particular Oracle, IBM DB2 and MS SQL Server. The next step would be porting the presented algorithm to django models mapper. Another interesting work would be translation of the proposed solutions to other ORMs and languages, for example Java and C#. In particular the problem of integrating recursive queries into LINQ for C# seems a very promising research topic. Note also, that our method do not uses strings to express any metainformation in the query. This means that in statically typed languages queries can be type checked in every single detail.

References 1. Boniewicz, A., Burzanska, M., Przymus, P., Stencel, K.: Recursive query facilities in relational databases: a survey. manuscript sent for DTA (2010) [if both papers are accepted, the editor may change this reference] 2. Burzanska, M., Stencel, K., Wisniewski, P.: Pushing Predicates into Recursive SQL Common Table Expressions. In: Grundspenkis, J., Morzy, T., Vossen, G. (eds.) ADBIS 2009. LNCS, vol. 5739, pp. 194–205. Springer, Heidelberg (2009) 3. JDBC Driver class documentation, http://java.sun.com/j2se/1.5.0/docs/api/java/sql/Driver.html 4. SQLObject, http://www.sqlobject.org/ 5. q-magic, http://www.sqlobject.org/SQLObject.html#q-magic 6. Recursive queries in PostgreSQL, http://www.postgresql.org/docs/8.4/static/queries-with.html 7. Ghazal, A., Crolotte, A., Seid, D.Y.: Recursive SQL Query Optimization with kIteration Lookahead. In: Bressan, S., K¨ ung, J., Wagner, R. (eds.) DEXA 2006. LNCS, vol. 4080, pp. 348–357. Springer, Heidelberg (2006) 8. Ordonez, C.: Optimization of Linear Recursive Queries in SQL. IEEE Trans. Knowl. Data Eng., 264–277 (2010) 9. Melnik, S., Adya, A., Bernstein, P.A.: Compiling mappings to bridge applications and databases. In: ACM SIGMOD, pp. 461–472 (2007) 10. Keller, W.: Mapping objects to tables: A pattern language. In: EuroPLoP (2007) 11. Codd, E.F.: A relational model of data for large shared data banks. Communications of the ACM (1970)

Suggest Documents