Document not found! Please try again

On the Issues of Expressiveness and Portability of ... - NUS Computing

12 downloads 0 Views 817KB Size Report
of Chiql. Gary C.K.Lam, Vincent Y. Lum and Kam-Fai Wong ... grows, the amount of infor- mation need be handled daily increases rapidly. Man- .... standards [lo].
On the Issues

of Expressiveness

and Portability

of Chiql

Gary C.K.Lam, Vincent Y. Lum and Kam-Fai Wong Department of Systems Engineering and Engineering Management Chinese University of Hong Kong, Shatin, N.T., Hong Kong

Abstract Cbiql is a Chinese database query language applicable to both novice and expert users. The language was designed with the following features: (1) It is based on a form of natural Chinese language syntax. (2) Chiql combines declarative and procedural querying styles. (3) The language is relational complete. The first feature renders Chiql easy-tolearn by any user who understands Chinese; the second feature makes the language easy-to-use even by non-experts; and the third feature indicates that the expressive power of Chiql can cover any relational data manipulation operation. To further evaluate Cbiql’s expressive power, Chiql was tested against the Lacroix query benchmark. Out of the 66 test queries in the benchmark, Cbiql could specify 65 of them and SQL only 60. This indicates that the expressive power of Cbiql is higher than SQL. Nevertheless, SQL is the de facto industrial database query standard. Today most database systems support SQL. Therefore, for practical reasons, a Chiql to SQL translator is being developed. The translator will make Chiql compatible with

existing technology (i.e.SQL) and the users will be able to use Cbiql in advance of the Cbiql database engine. In this paper, the expressiveness testing translation process are described.

1

and the Chiql/SQL

Introduction

As the Chinese economy grows, the amount of information need be handled daily increases rapidly. Manual handling of the increasing amount of information would be impractical, if not impossible. For this reason, there is a need to automate information processing, e.g. by using Database Management Systems (DBMS). Today DBMSs in China are Western products. Since they are ingrained with Western culture, ordinary Chinese users often find them difficult to use. It is strategically critical to overcome the aforesaid limitation in DBMS. To this front, an unconventional Chinese database query language, namely Chiql [l, 21, Proceedings of the Fourth International Conference on Database Systems for Advanced Applications (DASFAA’95) Ed. Tok Wang Ling and Yoshifumi Masunaga Singapore, April 10-13. 1995 @ World Scientific Publishing Co. Pte Ltd

is being developed. Chiql was design under the principle of “Integration of Chinese Culture into Technology” [3]. Integration of Chinese culture into a system is not just changing the English words to Chinese characters at the interface. Chinese does not express the same way as English and the semantics in one are frequently different from the other. Therefore, the design of Chiql has taken both the syntax and semantics of the Chinese language into account. The design goal is to work out a database query language which is easyto-learn and easy-to-use by anyone who understands Chinese - novice and expert users alike. Some researchers have attempted to introduce Chinese database query languages but the results are not encouraging: Intuitively, one could specify database queries using natural language. Naturally, this can make the query language easy to learn and to use. However, a query interface which could accept natural language in free form must be embedded with complex Natural Language Understanding (NLU) mechanisms. NLU, in general, is a complicated and timeconsuming process. Incorporating this in the query language interpreter would be cost-ineffective. This-is because the grammar of the query language will only be a very small subset of the full natural language grammar. For this reason, it is more practical to define a set of restricted grammar rules specific to the database query language. Some research projects, e.g. [4, 51 have attempted to develop Chinese natural language query interfaces using well-defined grammars. Nevertheless, the results presented are limited; in addition, only simple query examples have been shown. Other researchers have attempted to “chinesized” (i.e. Hanzihua art;) an English based database query language, such as SQL[G, 71. SQL was designed to reflect somewhat the flavour of natural English and therefore has structure and style similar to written English. Chinese language is very different from English, and what is natural in English can be completely unnatural in Chinese. For example, Chinese seldom has its qualification predicate expressed at the end of a sentence, as is done in SQL. Also, Chinese has no nested structures, as in SQL and natural English. For this

164

reason, “chinesized” SQL interfaces is not quite essyto-use for someone who has no background knowledge of English.

1.

2.

Chiql is based on restricted Chinese natural language with well defined grammar. Unlike the design of the other Chinese database query languages, such as [4, 51, the expressive power of Chiql has been thoroughly evaluated. It has been tested comprehensively using the 66 query set defined by Lacroix et. al. [8, ~512-5271.

3. 4.

Practicality is an important issue in the design of Chiql. A Chiql to SQL1 translator is being developed for this purpose. The idea is to enable execution of Chiql queries on SQL DBMS engines. Since most DBMSs today support SQL, the translator will render Chiql compatible with existing technology.

5.

6.

In this paper, the results of the Lacroix’s 66 query test and the Chiql/SQL translation algorithm are described. The rest of the paper is organised as follow: in the next section, the features of the Chiql language are outlined; section 3, describes the result of the 66 query test; the Chiql/SQL translation process is described in section 4; section 5 describes the problems related to the translation process; and this is followed by the conclusion in section 6.

2

Features

7.

8. 9.

of Chiql 10.

The Chiql language fulfills

the following

requirements

PI: Rf.

11. It is easy to use for the development simple and complex applications.

RZ. In terms of expressiveness, powerful as standard SQL.

of both

it must be at least as

The restricted natural language format and the combined procedural and declarative query style of Chiql render it easy for both simple and complex query specification. Also, Chiql is relational complete. Relational completeness is fundamental for relational data manipulation which is the basis of SQL. 2.1

Restricted

Chinese

Natural

Language

There are eleven well-defined formats to specify a Chiql query. The first five queries are used for single table manipulation and the last six are for multiple tables. The eleven query formats are shown below and the corresponding SQL representations are shown in Appendix A: ‘Based

on the ISO/ANSI

SQL1 standard

(ISOjDIS9075).

Note: The output clause, i.e. “Jll%%j%h“, of the eleven Chiql query formats is optional. Without it, the output of the query is directed to the standard output device. It is apparent from the Chiql query formats that the language is easy-to-learn and easy-to-use. Each format is a well formed Chinese sentence. For an active users, they need memorise very little to use Chiql. The only user requirement is that he must know Chinese and the eleven query formats. From a passive user’s point of view, Chiql is “What You See Is What You Get” (WYSIWIG) as the language semantics of a Chiql query directly reflects its operational semantics. 2.2

Declarative

Welty et. al. practitioners “Declarative guage “. SQL

and Procedural

Query

Style

[9] and many other relational database argued strongly against the statement language is better than proceduml Ianis a declarative query language. It offers

165 . _-.

__

,a.

Q58: List each employee and the difference of his salary from the average salary of his(her) department. NOTE: “the average salary” needs be stored in a temporary table before the salary difference can be calculated. Two SQL statements are required to express this test query.

high level abstraction as such users need not concern how a query is executed. But the expressive power of SQL is restricted by the declarative query style. Often with complex queries, a query cannot be easily specified in declaratively. Even if one can achieve that, the resultant SQL query is usually incomprehensible (see the example in [2, ~741). A Chiql query consists of one or more Chiql statement(s) - referred to as sub-query. While each Chiql sub-query is declarative in nature, the complete query itself is procedural. Using this query style, users can specify a query request2 more naturally than using SQL where the query style is pure declarative and single-statement based. 2.3

Relational

Q59: What of items ? NOTE: to work P.. ) per ments.

Foremost in the fulfillment of the R2 requirement, Chiql must be relational complete; otherwise, it would not be unable to cover all cases in relation data manipulation. Relational completeness is, in fact, a fundamental requirement for the design of any relational database query language. It requires that the language expresses at least five out of the eight basic relational operators, i.e. restriction, projection, product, union and difference. In Chiql, each of the five operations can simply be expressed using one of the eleven basic query formats (see section 2.1), viz:

3

Expressive

Power

Chiql

I

Query 2 1 6 10 11

Q62: List for each employee, his(her) salary, the average salary of the department where he(she) works, and the diflerence of his(her) salary from the average salary of his(her) department. NOTE: “the average salary” needs be stored in a temporary table before the salary difference can be calculated. Two SQL statements are required to express this test query.

Format

Q64: What is the average volume of items of type A supplied per supplier and per department (such that the supplier supplies the items of the type A to the department) ? NOTE: Two GROUP-BY operations are required to work out the following part: “per supplier and This implies two SQL stateper department”. ments.

I

of Chiql

To test the expressive power of Chiql as well as to compare it with SQL ( i.e. requirement R2), the Lacroix’s benchmark queries are used. The benchmark was specially designed for expressiveness evaluation:

“If we can progmm all 66 queries in a programming language, that language’s expressive power is at least adequate.“[8, ~5121 Out of the 66 test queries, Chiql could handle 65 (1 failure) of them and SQL 60 (6 failures). This result implies that Chiql is more expressive than SQL. The test queries which they failed to express are as follows: 2A query request is the intention of a query, e.g. “Retrieve everything from table A” is a query request and “SELECT * FROM A” is the corresponding query in SQL form.

Two GROUP-BY operations are required out the following part “for each supplier, department”. This implies two SQL state-

Q61: Give the overall average of the salam’es in all departments. Two GROUP-BY operations are required to work out the following part “overa average of all departments”. This implies two SQL statements.

Completeness

Relational Operators Restriction Proiection Product Union Difference

is, for each supplier, the average number per department that the supplier supplies

4Q66: Is it true that all the departments that sell items of type A are located on the third floor 1 NOTE: A boolean return value, i.e. True or False, is required here. But neither SQL nor Chiql supports this. Due to the declarative query style, one must express a query request, simple or complex, in a single SQL statement. This is the reason why SQL cannot handle Q58, Q59, QSl, Q62, and Q64. If multi-statement support was available in SQL3, one possible solution to the problem would be as follows4 3Multi-statement support is proposed in the SQL2 and SQL3 standards [lo] 4For simplicity, but without loss of generality, the GROUPBY, i.e. Q59, Q61 and Q64, problem is used here as an example.

166

1. Create a View to store the temporary the first GROUP-BY operation. 2. Perform

the second GROUP-BY compute the final result.

3. Finally,

drop the temporary the transaction.

operation

to

View at the end of

For example, based qn the above solution5 specification of Q59 is: CREATE VIEW templ SELECT supplier, FROM SUPPlY GROUP BY supplier,

4.1

result from

AS ( dept, count(item)

the SQL

countitem

Mapping(input = Chiql, statement SQL)

dept );

SELECT FROM GROUP BY

avg(countitem) templ supplier;

DROP VIEW

templ;

Chiql/SQL

= multi-

The input Chiq17 may consist of one or more Chiql statement(s). Each of the Chiql statements will be directly mapped to its SQL equivalence. This is a one to one mapping for each Chiql query format (see section 2.1) has a corresponding SQL representation. Appendix A shows the complete Chiql to SQL mapping list. In this example, using the mapping list in Ap pendix A, the resultant multi-statement SQL query is as follows:

CREATE VIEW templ SELECT name FROM Department WHERE floor=‘2’ );

AS (

SELECT item FROM Sales WHERE dept NOT IN ( SELECT * FROM

Since Chiql supports multi-statement, there is no problem to use it for specification of Q58, Q59, QSl, Q62 and Q64 in Chiql.

4

output

templ

));

Translation

For practical reason, a Chiql/SQL translator is designed. The SQL equivalence of any Chiql query can be produced by the translator. The users can specify a query request in Chiql, translate it to SQL and execute it in existing SQL DBMS engines. The translator is a strategically important tool. It will put the Chiql language in practice in advance of the availability of the Chiql DBMS engine. Consider the second query request in the Lacroix’s 66 query test suite: ‘IQ,!?. Find the items sold by no department on the second floor. u In Chiql, the request can be formulated as follows:

Both these Chiql statements are derived from the second Chiql query format (see section 2.1)6. To translate this Chiql query to SQL, the following operations will be performed: 5The schemas of the Department and Supplier relations are not important in this discussion. See [S] for details. gin this Chiql query, the first statement uses an output clause but the second query does not.

4.2

Nesting(input = multi-statement output = single statement SQL)

A multi-statement Chiql will give rise to the corresponding multi-statement SQL after the Mapping0 stage. However, most existing SQL database engines support only the SQL1 standard and do not cater for multiple statements. In the Nesting0 stage, multiple SQL statements will be combined to form a single SQL statement. It is noteworthy that although some researchers have attempted to transform a nested (single statement) SQL to its unnested counterpart (multiple statements) for optimisation purposes[ll, 12, 13, 14, 151, no one has ever attempted the transformation in the opposite direction which is the case in the Chiql to SQL translation process. The Nesting0 algorithm will start from the last SQL statement, and work its way backwards statement by statement until it finishes processing the first one. The goal is to transform the intermediate unnested SQL query to its nested SQL counterpart. The algorithm expressed in psuedo code is shown in figure 1.

7The correct.

167

SQL,

input

Chiql

query

is assumed

to be syntacticaliy

PROCEDURE Nesting0 BEGIN output the SELECT-clause Proce.*-the-FROM-cl.~‘=(); Proce.,-the-WHERE-=I~~,~(); Procel‘-the-GROUP-BY-~i~~.~(); Proc=..-the-HAVING-cl.“.=(); END PROCEDURE;

directly;

PROCEDURE Praccs~-the-FROM-cl~u,e() BEGIN FOR each table in the FROM-clause BEC3lN IF the table exists in tyatcm utsloge output the t.ble’. nsme; ELSE inspect the previous ing() to find the .O”IC~ of the temporary trble; output the original table nrme; END IF; END FOR;

Query Request: For each row in Rl, if the value of its B entry is greater than OT equal to the number of times that its (the row) corresponding C value appears ~VIR2, then output the value of A.

THEN statement

by

calling

Nest-

END

PROCEDURE Proc=,.-the-WHERE-cl~“~~() BEGIN FOR each condition in the WHERE-clause BEQIN IF the component, of the condition output the condition;

The query request can be specified using standard SQL. This will result in the following nested query: exi.t.

in system

cotsloge

THEN

ELSE

to find

the

inmptct the prouious .ource of the component; output the condition

statement with

the

by calling original

SELECT FROM WHERE

Ncsting()

component’s

nrme; END

END IF: END FOR; ’ PROCEDURE;

R1.A Rl R1.B >=

(

SELECT COUNT(*) FROM R2 WHERE R1.C = R2.C );

PROCEDURE Proc=..-the-GROUP-BY-~I.~,=() BEGIN oatput the GROUP-BY-clsuse; END PROCEDURE; PROCEDURE Proc=,.-the-HAVING-=I*“,=() BEGIN FOR each condition in tbe HAVING-clause BEGIN IF the components of the condition exittt in tyttem IF QROUP-BY bat been uted before in the thit o~tpbt the condition as HAVING-clsute; ELSE ovtput the WHERE-clause; END IF; ELSE IF inspect the previous statement in&$) to find the source of the component; output the original component name ; END IF; END FOR; END PROCEDURE:

catsloge ststement

by

tolling

THEN THEN

On the other hand, using multiple statements Kim [ll] pointed out that the above nested SQL query could be represented in the following unnested format: CREATE VIEW templ AS ( SELECT R2.C, COUNT0 FROM GROUP BY R2.C;;

Nest-

R1.A SELECT Rl, templ FROM WHERE (R1.C = temp1.C ) AND (R1.B >= templ.COUNT);

Figure 1: The algorithm for unnested to nested SQL transformation. Application of the Nesting0 algorithm to the above unnested SQL query results in the following nested SQL query formulation: SELECT FROM WHERE

5

item Sales dept NOT IN ( SELECT name FROM Department WHERE floor=2 );

Problems

in Chiql/SQL

The

“count

bug”

Consider the the following

1 Relation

Rl

1

1 Relation

R2 1

Translation

SQL is by no means flawless [16]. Tow issues related to the semantics ambiguity in SQL, in particular, have caused some difficulties in the Chiql/SQL translation process. They are the “Count bug” [14] and “duplicate rows” problems. In this section, these as well as the “expressiveness” problems are described. 5.1

In general, both the nested and unnested SQL representations are equivalent (i.e. produce the same result). However, there is an an exceptional case. Consider the following row entries in Rl and R2:

problem tables, Rl and R2:

Application of the nested query to the above tables will result in the solution set (al,a2); but the solution set of the unnested query is different, namely (al). This happens because of the following: In the nested query, even though row r2 in table Rl does not satisfy the predicate in the inner query (i.e. R1.C = R2.C), the aggregate function COUNT(*) returns zero to the upper query block and hence R1.b (i.e. 1) is greater

168

than this COUNT (i.e. zero) ; as a result, row r2 is selected. This selection mechanism is based on the TIS (Tuple Iterative Semantics) [14] of Nested SQL. Under TIS, the satisfaction of both the inner and outer query predicates (refer to the above nested query) does not take place at the same time. On the other hand, it is explicitly specified in the unnested query that both predicates must be satisfied for a row to be selected. For that reason, only row rl in table Rl satisfies the query and not both rl and r2. According to the original query request, the correct answer should be (al). This implies that one should adopt the unnested representation. However, the existing SQL standard (i.e. SQLl) does not support multi-statement. In view of this, Mu [15] used outer join [18, ~~1641651 rather than the normal natural join in the join predicate (i.e. R1.C = R2.C). For example, using the explicit outer join specification in ORACLE [19, section 6.141 the nested query is represented as follows:

SELECT FROM WHERE

Rl .A Rl R1.B >=

e2 e3

The “duplicate

d2 d3

Location

t

f2

I

I

d2

t

Notice that the first and the second rows in the Location table are the same, i.e. (fl,dl) - i.e they are “duplicate rows”. Over these tables, one would like to “find the employee’s number whose working department is listed in the location table”. In unnested formulation, this query request can be specified as follows: CREATE VIEW templ AS ( SELECT eao, office FROM Employee GROUP BY e-no,ofhce); SELECT e-no FROM templ,Location WHERE office = dept); DROP VIEW

( SELECT COUNT(*) FROM R2 WHERE R1.C = R2.C (+) );

rows”

templ ;

Specification of the same request in nested form will result in the following SQL query:

Since explicit outer join specification is not part of the existing SQL standard, it is not used in the Chiql/SQL translation process. The translator produces the unnested representation. For this reason, the resultant SQL query cannot be executed in SQL database engines which do not support multistatement.

5.2

d2 d3

1

SELECT e-no FROM Employee GROUP BY e,no,office HAVING office IN ( SELECT dept FROM Location

);

Although, both the nested and unnested queriesexpressed the same query request, their results are different. The unnested query results in duplicate rows but the nested query does not (see below).

problem

At present, unless the users explicitly specified not to do so, many DBMSs support duplicate rows [16]. Due to duplicate rows, the results from a nested query are different from its unnested counterpart. This would inevitably create semantic ambiguity in Chiql/SQL translation for the original Chiql is unnested and the resultant SQL is nested. Consider the following example: Employee and Location are 2 relational tables. The Employee table describes the employee’s number (e-no), his/her working place (office) and the source of his/her salary (funded-by). The Location table contains those necessary departments (dept) and its corresponding location (floor), viz

169

yzfzq

To avoid this “duplicate row” problem, Chiql/SQLtranslator introduce the “DISTINCT” word to every SELECT clause, e.g. CREATE VIEW templ AS ( SELECT DISTINCT eno, office FROM Employee GROUP BY e-no, office);

the key-

Processing, Vo.8, No.2, published by the Chinese Information Society of China, Beijing, 1994, ~~26-31. (in Chinese)

SELECT DISTINCT eno FROM templ ,Location WHERE office = dept; DROP VIEW 5.3

PI Lum, V., Wong, K.F. and Lam, G.C.K.,

“Chiql - an Unconventional Chinese Database Query Language”, In Proc. 1994 International Conference on Computer Processing of Oriental Languages, ~~69-74, Taejon, Korea, May 10-13, 1994.

templ;

The “expressiveness”

[31 Lum, V., “Advancing

Computerization in China by Integrating Cultural Aspects into Technology”, invited presentation in the International Workshop on Natural Science Research Strategic Planning, organised by NSFC, Beijing, China, August 3-6, 1994. (a paper related to the presentation is available from the Department of Systems Engineering and Engineering Management, Chinese University, Hong Kong)

problem

As shown in the results of the expressiveness test (see section 3), there are five query requests which SQL failed to express and not Chiql. The Chiql/SQL translator handles these five cases specially. The nesting stage will be omitted and the resultant SQL query will be in unnested form. These five unnested SQL queries, however, will not be directly executable by conventional SQL1 based DBMS engines.

6

and Chen, Q.-B., “The Design and Implementation of Chinese Query Interface to Database System”, Journal of Chinese Information Processing, vo1.5(4), published by the Chinese Information Society of China, Beijing, 1991, pp43-49. (in Chinese)

[41 Lu, G.-M.

Conclusion

[51 He, K., Feng, M. and Li, X., “A Chinese quiring into Relational Database”, Systems for Processing Oriental Florida, Dec. 15-18, 1992.

Chiql is a new query language for Chinese database users. Its expressive power has been thoroughly tested using the Lacroix’s benchmark. Out of the 66 test queries, Chiql only failed to represent one. The test query which Chiql failed on involved the return of a boolean value. In the future, boolean operations will be introduced to the language. Moreover, the test result has shown that Chiql is more expressive than SQL.

“CDSA Model and its Realization Database”, Journal of Chinese Information Processing, vol.6(1), published by the Chinese Information Society of China, Beijing, 1992, ~~8-15. (in Chinese)

Fl Wu, Z. and Gao, by NLI

Chinese Natural Language Interface”, gent Systems for Processing Oriental 270, Florida, Dec. 15-18, 1992.

t81 Ozkarahan,

and Practice,

In addition to the Chiql/SQL translator, a Chiql compiler and a Chiql X-window based graphical user interface system are currently under development. Once the tanslator is completed, Chiql will be introduced to the “Introduction to Database” course. The students will help evaluate the practicality of the language by using a conventional SQL DBMS as the execution platform. Acknowledgement

E., Database Management: Prentice-Hall, 1990.

In Proc. of IntelliLanguages, pp265-

Concepts,

Design

PI

Welty, C. and Stemple, D.W., “Human Factors Comparison of a Procedural and a Non-procedural Query Lanon Database Systems, Vol. 86, guage”, ACM Transactions No. 4, Dec., 1981.

WI

Language SQL2 and SQL3”, ISOMelton, J., “Database ANSI working draft, IS0 DBL CAN-2b, May 1989.

WI

Nested Query”, an SQL-like Kim, W., “On Optimizing ACM Tran. on Database Systems, Vol. 7, No. 3, September 1982, pp443-469.

PI

Ganski, E. and Wang, H.K.T., “Optimization SQL Queries Revisited,” In Proc. of the ACM Conf., 1987.

of Nested SIGMOD

U., “Of Nests and Trees: A Unified Approach Processing Queries,” In Proc. of the 13th Int. Conf. VLDB, 1987.

1131 Dayal,

The Chiql project is partially supported by the Research Grant Committee (RGC) of Hong Kong under the contract number, CUHK/256/94E.

to on

and Dataflow AlgoM., “Optimization rithms for Nested Tree Queries”, In Proc. of 15th International Conference on VLDB, ~~77-85, Amsterdam, 1989.

iI41 Muralikrishna,

References Lum, ment

G.F.,

to Relational

[71 Zhang, H, Chang, G., Yang, C. and Yan, P., “A General

A Chiql/SQL translator is being developed. In the translation process, an unnested Chiql query is first transformed to its unnested SQL equivalence and whenever possible the latter is further transformed to a nested SQL query. Some works have been done on the transformation of nested to unnested query; but the transformation in the opposite direction (e.g. Chiql to SQL) is new.

[l]

Interface InIn Proc. of Intelligent Languages, ~~271-273,

P51 Meng, X.F., ter thesis, Chinese)

V. and Zhang, S., “A Chinese Database ManageSystem Interface”, Journal of Chinese Information

170

“Unnesting Process of Nested Query”, Ma+ Renmin University, Beijing, China, 1993. (in

WI Date, C.J. and Warden, A., Relational 1985-l 989, Addison-Wesley,

Database

Writings

[I71 Kiessling, W., “On Semantic Reefs and Efficient Processing of Correlation Queries with Aggregates”, In Proc. of the 11th International Conf. on VLDB, 1985.

P31

PI

Elmssri, R. and Navathe, Systems, Adison-Wesley, ORACLE,

SQL

6.0, 1990. (Part

Appendix

A: Chiql

j

S.B., Fundamentals 1989.

Language

Version

Reference

CREATE VIEW SELECT FROM GROUP BY

4. i$#mMrdER: (&)&aPJ%#>, j

j

CREATE VIEW SELECT FROM intersect SELECT FROM );

j

CREATE VIEW AS ( SELECT FROM WHERE intersect SELECT FROM WHERE

8.

of Database

Manual,

to SQL Mapping

ORACLE

&ba