A Critique of the SQL Database Language

9 downloads 283 Views 2MB Size Report
recursively defined syntax for" expressions that exploits to the full any closure properties the object class may posses
THE

A

C R I T I Q U E

0

S Q L

D A T A B A S E

L A N G U A G E

F

C.J.Date P O Box 2647~ S a r a t o g a California 9~.~7(.~ U S A

December

1983

The ANS Database Committee (X3H2) is c u r r e n t l y at work on a proposed standard relational database language (RDL)~ and has adopted as a basis for that activity a definition of the "structured q u e r y l a n g u a g e " S Q L f r o m IBM [10]. M o r e o v e r ~ numerous hardware and software vendors (in a d d i t i o n to IBM) h a v e already released or at least announced products that are based to a greater or l e s s e r e x t e n t on t h e S Q L l a n g u a g e as d e f i n e d b y IBM. There can thus be little doubt that the importance of that l a n g u a g e will i n c r e a s e s i g n i f i c a n t l y over the next few years. Yet the S Q L l a n g u a g e is v e r y f a r f r o m p e r f e c t . T h e p u r p o s e of this paper is to p r e s e n t a c r i t i c a l a n a l y s i s of t h e l a n g u a g e ' s major shortcomings~ in t h e h o p e t h a t it m a y b e p o s s i b l e to r e m e d y s o m e of the deficiencies before their influence becomes too allpervasive. The paper's standpoint is p r i m a r i l y t h a t of formal computer languages in g e n e r a l ~ rather than that of database languages specifically.

sql

critique 8

I.

INTRODUCTION

The relational language SQL ( t h e acronym i s u s u a l l y pronounced "sequel"), pioneered in the IBM p r o t o t y p e System R [i] and subsequently a d o p t e d by IBM a n d o t h e r s as t h e b a s i s f o r numerous commercial implementations, represents a major advance over older database l a n g u a g e s s u c h as t h e D L / I l a n g u a g e of IMS a n d t h e DML and DDL of the Data Base Task Group (DBTG) of CODASYL. Specifically, SQL is far easier to use than those older languages; as a r e s u l t , u s e r s in a S Q L s y s t e m (both end-users and application programmers) c a n b e far m o r e p r o d u c t i v e t h a n t h e y u s e d t o b e in t h o s e o l d e r s y s t e m s (improvements of up t o 2 0 t i m e s have been reported). Among the strongpoints of S Q L t h a t l e a d t o such improvements we may cite the following: simple

data

powerful short

structure

operators

initial

learning

period

improved d a t a independence integrated

data

double mode of integrated

These

definition

and

data

manipulation

use

catalog

compilation

and

optimization

advantages

are

elaborated

in

the

appendix

to

this

paper.

T h e l a n g u a g e d o e s h a v e i t s w e a k p o i n t s too, however. In f a c t , it c a n n o t b e d e n i e d t h a t S Q L in i t s p r e s e n t f o r m l e a v e s r a t h e r a lot t o b e d e s i r e d -- e v e n t h a t , in s o m e i m p o r t a n t r e s p e c t s , it f a i l s to realize the full potential of the relational model. The purpose of t h i s p a p e r is t o d e s c r i b e a n d e x a m i n e s o m e of those w e a k p o i n t s , in t h e h o p e t h a t s u c h a s p e c t s of t h e l a n g u a g e m a y b e improved before their influence becomes too all-pervasive. Before getting into details, I should like to make one point absolutely clear: The c r i t i c i s m s that follow should not be construed as criticisms of the original designers and implementers o f t h e SQL language. The paper i s i n t e n d e d s o l e l y as a c r i t i q u e of t h e SQL language as such, and n o t h i n g more. Note also that t h e paper a p p l i e s s p e c i f i c a l l y t o t h e d i a l e c t of SQL implemented by IBM i n i t s p r o d u c t s SQL/DS, D B 2 , and QMF. It is e n t i r e l y p o s s i b l e t h a t some s p e c i f i c p o i n t does n o t a p p l y t o some o t h e r implemented d i a l e c t . However, most p o i n t s o f t h e paper do a p p l y t o most of t h e d i a l e c t s c u r r e n t l y implemented, so f a r as I am aware. The

sql

remainder

of

the

paper

is

critique

9

divided

into

the

following

sections: lack

of

orthogonality:

expressions

lack

of

orthogonality:

builtin

lack

of

orthogonality:

miscellaneous

formal

functions items

definition

mismatch missing

with

host

languages

function

mi s t a k e s aspects

of

summary

and

the

relational

model

not

supported

conclusions

Reference [3] g i v e s s o m e b a c k g r o u n d m a t e r i a l -- s p e c i f i c a l l y ~ a set of principles that a p p l y to the design of programming languages in g e n e r a l a n d d a t a b a s e l a n g u a g e s in particular. Many of the criticisms that follow are expressed in t e r m s of those principles. Note: Some of t h e p o i n t s a p p l y to i n t e r a c t i v e SQL only and some to embedded SQL only~ b u t m o s t a p p l y to both. I have not bothered to spell out the distinctions; the context m a k e s it c l e a r in e v e r y c a s e . A l s o ~ t h e s t r u c t u r e of t h e p a p e r is a little arbitrary~ in t h e s e n s e t h a t it is n o t really always clear which heading a particular point belongs under. There is also some repetition (I h o p e n o t t o o m u c h ) ~ for e s s e n t i a l l y the same reason.

sql

critique I0

2.

LACK

It

is

OF

ORTHOGONALITY:

convenient

to

EXPRESSIONS

begin

by

* A t~b_l_e_-eE.p.ces.si_on - is a for example, the expression SELECT FROM WHERE

* EMP DEPT#

=

A

=

is

A row-exQressioo for example, the SELECT FROM WHERE

* EMP EMP#

=

row-expression

a

or

AVG EMP

that

a SQL expression expression

special

case

of

is a SQL expression expression

terms.

yields

a

table

--

that

yields

a

single

a

table-expression.

that

yields

a

single

row

~E2" is

a

special

* A scalar-expression is a scalar value -- for example, SELECT FROM

expression

nonSQL

~D3 ~

column-expression

* --

A

EMP# EMP DEPT#

SQL

some

~D3'

* A ~o_ik.!mn_2_eEQce_s_si_oQ i s column -- for example, the SELECT FROM WHERE

introducing

case

of

a

table-expression.

SQL expression the expression

that

yields

a

single

(SALARY)

the expression SELECT FROM WHERE

SALARY EMP EMP# =

~E2'

A scalar-expression is a special c:ase o f special c a s e of a c o l u m n - e x p r e s s i o n .

a

row-expression

and

a

Note t h a t t h e s e f o u r k i n d s of e x p r e s s i o n c o r r e s p o n d t o t h e four c l a s s e s of data o b j e c t ( t a b l e , c o l u m n ; r o w , s c a l a r ) s u p p o r t e d by SQL -- though incidentally SQL i s i n c o n s i s t e n t as t o w h e t h e r i t s e x p r e s s i o n s y i e l d v a l u e s or r e f e r e n c e s , i n g e n e r a l . Note t o o t h a t (as pointed out partially ordered

sql

critique

in [3]) the as follows:

four

II

classes

of

object

can

be

table

(highest)

V

V col umn

row

V (i o w e s t )

s c a l ar (columns are neither this ordering). As e x p l a i n e d in c l a s s of o b j e c t

higher

[3] ( a g a i n ) , it s u p p o r t s ,

nor

a at

lower

to

for- c o m p a r i n g

a means for another;

assigning

rows

with

respect

to

l a n g u a g e s h o u l d p r o v i d e , for" e a c h l e a s t all of t h e f o l l o w i n g :

a constructor function, i.e., object of t h e c l a s s from l i t e r a l v a r i a b l e s of l o w e r c l a s s e s ; a means

than

a means for (constant)

two

objects

the

value

a selector function, i.e., o b j e c t s of l o w e r c l a s s e s f r o m

of

of

the

one

constructing an values and/or

class;

object

in

a means for extracting an o b j e c t of t h e g i v e n

the

class

component class;

a general, recursively d e f i n e d s y n t a x for" e x p r e s s i o n s that exploits to the full any closure properties the object class may possess. The table below shows these requirements.

sql

critique

that

SQL

12

does

not

really

measure

up

to

\

opn

~

constructor

compare

ob.j\

: ~

assign

only table

:

no

no

÷

via

~ INSERT SELECT

-

: selector ~

: gen ~ expr

:

:

: :

yes

+

÷

no

: (see :below) ~

:

column :

o n l y a s a r g to: : IN ( h o s t v b l e s : :

& c:onsts

no

:

no

no

~ only ~ from

:

yes

+

~ only in INSERT: ~ & UPDATE ( h o s t :

row

~ vbles ~ only)

scalar

~

no

only):

& consts:

to/ set

~ :

:

~ of h o s t ~ scalars

÷

+

~

: : ~

: : ~

: only to/ : : from host: ~ scalar ~

N/A

yes

,

(yes)

~ ~

: :

~ ~

÷

~

(yes)

~ : ~

no

no

Let us consider table-expressions in m o r e detail. The SELECT statement, which., s i n c e it y i e l d s a t a b l e , m a y b e r e g a r d e d as a table-expression (possibly of a d e g e n e r a t e form, e.g., as a column-expression)., currently has the following structure: SELECT FROM WHERE

scalar-expression-commalist t a b I e - n a m e - c o m m a l i st predicate

(ignoring numerous irrelevant details). N o t i c e t h a t it is just ~l_able2name_s t h a t appear- in t h e F R O M c l a u s e . Completeness suggests that it should be ta_ble__-eEQEessiQns (as Gray puts it [8]., "anything in c o m p u t e r science t h a t is n o t r e c u r s i v e is n o g o o d " ) . T h i s is n o t j u s t an a c a d e m i c consideration, by the way; on the contrary, there are several practical reasons as to why such recursiveness is d e s i r a b l e . First, consider the relational algebra. Relational algebra possesses the important property of closure -- that is~ relations form a closed system under the operations of the algebra., in t h e s e n s e t h a t t h e r e s u l t of a p p l y i n g a n y of t h o s e operations to any relation(s) is i t s e l f a n o t h e r relation. As a consequence, the operands of any given operation are not constrained to be real ("base") relations only, but rather can be any algebraic expression. Thus, the relational algebra allows the user to write 0 ~ relational ~2R~i~0~ -- and this feature is u s e f u l f o r p r e c i s e l y the same reasons that nested expressions are useful in o r d i n a r y arithmetic.

or

sql

Now consider indirectly,

critique

SQL. all

SQL the

is a l a n g u a g e operations of

13

that supports, the relational

directly algebra

(i.e., SQL is r e l a t i o n a l l y complete). However, the tableexpressions of SQL (which are the SQL equivalent of the expressions of t h e r e l a t i o n a l algebra) ~aQoQt be arbitrarily nested. Let u s c o n s i d e r t h e q u e s t i o n of e x a c t l y w h i c h cases SQL does support. Simplifying matters slightly, the expression SELECT - FROM - WHERE is the SQL version of the nested algebraic expression projection

( restriction

( product

( table1,

table~,~

...

)

)

)

(the product corresponds to t h e F R O M c l a u s e , the restriction to t h e W H E R E c l a u s e , and the projection to the SELECT clause; tablel, table2, ... are the tables identified in t h e FROM c l a u s e -- a n d n o t e t h a t , as r e m a r k e d e a r l i e r , t h e s e a r e s i m p l e table-names, not more complex expressions). Likewise, the expression SELECT UNION SELECT

is t h e

SQL

union

...

FROM

...

WHERE

...

...

FROM

...

WHERE

...

version

of

( tabexpl,

the

nested

tabexp2,

...

algebraic

expression

)

where tabexpl, tabexp2~ ... a r e in t u r n t a b l e - e x p r e s s i o n s of the form shown earlier (i.e., projections of r e s t r i c t i o n s of p r o d u c t s of n a m e d t a b l e s ) . B u t it is n o t p o s s i b l e to f o r m u l a t e direct equivalents of a n y o t h e r n e s t e d a l g e b r a i c e x p r e s s i o n s . Thus, for example, it is n o t p o s s i b l e to write a direct equivalent in S Q L of t h e n e s t e d e x p r e s s i o n restriction

( projection

( table

)

)

Instead, the user has to recast the expression into a semantically equivalent (but s y n t a c t i c a l l y different) form in which the restriction is a p p l i e d b e f Q ~ e t h e p r o j e c t i o n . What this means in p r a c t i c a l t e r m s is t h a t t h e u s e r m a y have to expend time and effort transforming the "natural" formulation of a given query into some different, and arguably less "natural", representation (see E x a m p l e b e l o w ) . W h a t is m o r e , t h e u s e r is t h e r e f o r e a l s o r e q u i r e d to u n d e r s t a n d exactly when such transformations are valid. This may not always be intuitively obvious. For example, is a p r o j e c t i o n of a u n i o n always equivalent t o t h e u n i o n of t w o p r o j e c t i o n s ? Example: NYC SFO

Given ( EMP#, ( EMP#,

(representing respectively),

sql

critique

the

two

tables

DEPT#~ DEPT#~

SALARY SALARY

New list

York EMP# for

) ) and all

14

San Francisco employees.

emp ioyees,

"Natural"

formulation

(projection

of

a union):

SELECT EMP# FROM ( NYC UNION SFO ) SQL f o r m u l a t i o n SELECT UNION SELECT

(union of

EMP#

FROM

NYC

EMP#

FROM

SFO

two p r o j e c t i o n s ) :

We r e m a r k in p a s s i n g t h a t a l l o w i n g b o t h f o r m u l a t i o n s of the query would enable different users to perceive and express the same problem in d i f f e r e n t ways (ideally~ of course~ both formulations would translate to the same internal representation~ for otherwise the choice between the two would no longer be arbitrary). The foregoing e x a m p l e t a c i t l y m a k e s u s e of t h e f a c t t h a t simple table-reference (i.e.~ a t a b l e - n a m e ) QYgh~ to be just s p e c i a l c a s e of a g e n e r a l t a b l e - e x p r e s s i o n . Thus we wrote NYC instead

UNION

a a

SFO

of

SELECT

~ FROM

NYC

UNION

SELECT

i FROM

SFO

which current SQL would require. It w o u l d b e h i g h l y d e s i r a b l e for SQL to allow the expression "SELECT ~ FROM T" to be replaced b y s i m p l y "T" w h e r e v e r it a p p e a r s ~ in t h e s t y l e of more conventional languages. In o t h e r w o r d s ~ S E L E C T s h o u l d b e regarded as a s t a t e m e n t whose function is t o r e t r i e v e a table ( r e p r e s e n t e d by a t a b l e - e x p r e s s i o n ) . Table-expressions per se -- in particular~ nested table-expressions -- should not require the "SELECT ~ FROM". Among other things this change would improve the usability of t h e E X I S T S builtin function (see l a t e r ) . It w o u l d a l s o b e c l e a r t h a t I N T O a n d O R D E R BY a r e clauses of t h e S E L E C T ~ t ~ n ~ a n d n o t p a r t of a table(or column-) expression; t h e q u e s t i o n of w h e t h e r t h e y c a n a p p e a r in a nested expression would then simply not arise, thus avoiding the need for a rule that looks arbitrary b u t is in f a c t not. A nested table-expression is p e r m i t t e d -- in f a c t required -- in current S Q L as t h e a r g u m e n t t o E X I S T S (but strangely enough not as t h e a r g u m e n t to t h e o t h e r builtin functions; this p o i n t is d i s c u s s e d in t h e n e x t s e c t i o n ) . Nested column~E~C~iQQ~ ("subqueries") a r e (a) ~ g u ~ r e d with the "ANY" and "ALL" operators ( i n c l u d e s t h e IN o p e r a t o r ~ w h i c h is just a different s p e l l i n g for = A N Y ) ; a n d (b) Q ~ m i t t e d with scalar comparison operators (~ =~ etc.)~ if a n d o n l y if the column-expression yields a c o l u m n h a v i n g at m o s t one row. Moreover, the nested expression is a l l o w e d t o i n c l u d e G R O U P BY and HAVING in case (a) but not in case (b). More arbitrariness.

sql

critique IS

Elsewhere I have proposed some extensions to SQL to support the outer join operation [4]. The details of t h a t p r o p o s a l do not concern us here; what does concern u s is t h e f o l l o w i n g . If the user needs to compute an o u t e r j o i n of three or more relations, then (a) that outer _join is constructed by performing a sequence of ~!i_[!~E2 o u t e r joins (e.g., join relations A a n d B, then join the result and relation C); and (b) it is e s s e n t i a l that the user indicate the sequence in which tlnose binary joins are performed, because different sequences wi i i produce different results, in general. Indicating the required sequence is done, precisely, by writing a suitable nested expression. Thus, nested expressions are @=ss]eQt~i_al_ if S Q L is t o provide direct (i.e., singlestatement) support for general o u t e r j o i n s of m o r e t h a n two tel a t i o n s . Another example (involving outer join again): P a r t of the proposal for- s u p p o r t i n g o u t e r j o i n [4] i n v o l v e s t h e u s e of a new clause, the PRESERVE clause, whose function is t o p r e s e r v e rows from the indicated table that would not otherwise participate in t h e r e s u l t of t h e S E L E C T . Consider the tables COURSE OFFERING

( COURSE#, ( COURSE#,

SUBJECT ) OFF#, LOCATION

)

a n d consider- t h e q u e r y " L i s t all a l g e b r a courses, offerings if any" The two SELECT statements (neither of which is valid in current SQL, represent two attempts to formulate this query: ALGEBRA. COURSE#, OFF#, LOCATION ( SELECT COURSE# FROM COURSE WHERE SUBJECT = ~Algebra ~ ) ALGEBRA, WHERE ALGEBRA.COURSE# = OFFERING.COURSE# PRESERVE ALGEBRA

with their fol l o w i n g of course>

SELECT FROM

SELECT FROM WHERE AND PRESERVE

OFFERING

COURSE.COURSE#, OFF#, LOCATION COURSE, OFFERING COURSE.COURSE# = OFFERING. COURSE# SUBJECT = ~Algebra' COURSE

Each of these statements does list all algebra courses, together with their offerings, f o r all s u c h c o u r s e s that do have any offerings. The first also lists algebra courses that do not have any offerings, concatenated with null values in the OFFERING positions; i.e., it p r e s e r v e s information for those courses (note the introduced name ALGEBRA, w h i c h is u s e d to r e f e r t o t h e r e s u l t of e v a l u a t i n g the inner expression). The second, by contrast, preserves information not only for algebra courses with no offerings, b_L~ a.lso f o r al..l c Qb~rse_s f..or_ which, t_h_e ~L~i~c_~ i__s no_t al_gebj2 ~ ( r e g a r d l e s s of whether those courses have any offerings or n o t > . In o t h e r w o r d s , t h e

sql

critique 16

first preserves information for algebra courses only (as required)., the second produces a l o t of u n n e c e s s a r y output. And note that the first cannot even be formulated (as a s i n g l e statement) if n e s t e d e x p r e s s i o n s are not supported. * In f a c t , SQL kind of "under ex a m p i e : Base

does the

alreacly covers"

support sense.

nested expressions in a Consider the following

table: S

( S#., S N A M E ,

STATUS,

CITY

)

View d e f i n i t i o n : CREATE AS

Query

VIEW LONDON SUPPLIERS S E L E C T S#, SNAME., S T A T U S FROM S WHERE CITY = ~London ~

(Q) :

SELECT FROM WHERE Resulting

* LONDONSUPPLIERS STATUS > 50 SELECT

SELECT FROM WHERE AND

statement

(Q'):

S#., S N A M E ~ S T A T U S S STATUS > 50 CITY = ~London ~

The SELECT statement Q' i s o b t a i n e d from the original query Q by a process usually described as "merging .... statement Q is "merged" with the SELECT in t h e v i e w d e f i n i t i o n to produce statement Q'. To the naive user this looks a little bit like magic. But in fact what is going on is simply that the reference to LONDON_SUPPLIERS in t h e F R O M c l a u s e in Q i s b e i n g replaced by the expression that ~ n ~ LONDON_SUPPLIERS, as follows: SELECT * FROM ( SELECT FROM WHERE WHERE STATUS

S#., SNAME., S T A T U S S CITY = ~London ~ ) > 50

This explanation~ though both accurate and easy to understand., cannot conveniently b e u s e d in d e s c r i b i n g or teaching SQL., precisely because SQL does not support nesting at the external or

user's

* UNION things)

sql

critique

level.

is not permitted cannot be used

in a s u b q u e r y . , a n d h e n c e (among other in t h e d e f i n i t i o n of a v i e w (although

17

strangely enough it c a n b e u s e d t o d e f i n e t h e scope for a cursor in e m b e d d e d SQL). So a view cannot be "any derivable relation", and the relational closure property breaks down. Likewise, I N S E R T ... S E L E C T c a n n o t b e u s e d t o a s s i g n t h e u n i o n of two relations to another relation. Yet another consequence of the special treatment g i v e n t o U N I O N i s t h a t it is not possible to apply a builtin function such as AVG to a union. See the following section. We conclude this discussion of S Q L e x p r e s s i o n s by additional (and apparently arbitrary) restrictions.

noting

The predicate C BETWEEN A AND B is equivalent predicate A

Suggest Documents