A Query Language for Multidimensional Arrays: Design, Implementation, and Optimization Techniques Leonid Libkin
AT&T Bell Laboratories 600 Mountain Avenue Murray Hill, NJ 07974 E-mail:
[email protected]
Rona Machlin
Dept. of Comp. and Info. Science University of Pennsylvania Philadelphia, PA 19104-6389, USA E-mail:
[email protected]
Limsoon Wong
Institute of Systems Science Heng Mui Keng Terrace Singapore 0511 E-mail:
[email protected]
Abstract While much recent research has focused on extending databases beyond the traditional relational model, relatively little has been done to develop database tools for querying data organized in (multidimensional) arrays. Perhaps because the data models of existing DBMS's do not adequately support arrays, the scienti c computing community has made little use of available database technology. Instead, multidimensional scienti c data is typically stored in local les conforming to various data exchange formats and queried via specialized access libraries tied in to general purpose programming languages. To allow such data to be queried using known database techniques, we design and implement a query language for multidimensional arrays. Our main design decision is to treat arrays as functions from index sets to values rather than as collection types. This leads to both clean syntax and semantics as well as simple but powerful optimization rules. We present a calculus for arrays that extends standard calculi for complex objects. We derive a higher-level comprehension style query language based on this calculus and describe its implementation in ML, including a data driver for the NetCDF data exchange format. Next, we explore some optimization rules obtained from the equational laws of our core calculus. Finally, we study the expressiveness of our calculus and prove that it essentially corresponds to adding indices explicitly in a query language for complex objects { be they based on sets, bags or lists.
1 Introduction Data organized into multidimensional arrays arises naturally in a variety of scienti c disciplines. Yet the array type has received little attention in most recent database research on data models and query languages. In their 1993 wake-up call to the database research community [21], Maier and Vance argue that lack of adequate support for arrays in existing DBMS's is one of the reasons the scienti c computing community has made little use of database technology. Instead, multidimensional scienti c data is typically stored in local les using specialized data exchange formats. Since most of these formats do not have query language interfaces, queries must be written in general purpose programming languages (GPPLs) using access libraries associated with the formats. Ideally, a query language interface would provide two important advantages over such an approach:
Query languages are high-level, strongly typed, and declarative, making programming a quicker, less error-prone process.
1
Query languages are based on small sets of specialized constructs, so they are more easily optimized, taking the burden of producing ecient code away from the programmer.
Unfortunately, the data models of most existing query systems do not support multidimensional arrays as rst class citizens. So using them to query data exchange formats might end up being more awkward and less ecient than using a GPPL. In this paper, we develop a novel data model for arrays based on the point of view that arrays are functions rather than collection types. For this model, we de ne a a set of specialized constructs forming a calculus for arrays which we call NRCA. We then obtain a high-level declarative query language AQL based on this calculus and prove our model via an implementation of the language. The system we obtain includes an optimizer based on the equational theory provided by the model. To make things concrete, suppose we wish to answer the following query: On what days did the weather become unbearably hot in NYC last summer?
Now, suppose we have access to the following data organized as multidimensional arrays 1. TEMP, a three dimensional array containing hourly temperature readings at surface level over time, latitude, and longitude. 2. RELHUM, a three dimensional array containing hourly relative humidity readings at surface level over time, latitude, and longitude. 3. WSPEED, a four dimensional array containing half-hourly wind speed readings ranging over time, latitude, longitude, and altitude. Note that this le diers in both dimensionality and gridding from the rst two. Further, suppose we measure the \unbearability" of weather for a particular day and location via an algorithm which expects a 3-dimensional array (time, latitude, and longitude as indices) containing hourly temperature, relative humidity, and wind speed readings varying over a small nearby region. (It's only an unbearable day at a given location if it remains hot, humid, and breezeless over several hours and there is no cool place to escape to nearby!) How eciently can this query be answered, if at all, by database query languages without arrays? There are several reasons why this query might be hard to answer: First, input data with dierent dimensionalities and grids must be correlated (WSPEED must go from a half-hourly grid to an hourly grid. Also, its dimensionality must be reduced by taking a cross section at its surface level altitude.). Second, the three arrays must be combined into one array, putting elements from corresponding indices together (a 3-way, 3-dimensional zip). Next, subslabs (generalized subsequences) must be taken to get the daily readings in the vicinity of NYC necessary for the \unbearability" index algorithm. Finally, this algorithm must be applied to the arrays so obtained to select those days that were unbearable. Here is the query in AQL1 : : fn \u => :: fn (\x,\y) => 1
(* input unbearability threshold *) (* input longitude and latitude *)
This assumes the readings for last summer have already been input into the variables T,R,W and daysinsummer and have already been declared.
uindex
2
:: :: :: :: :: ::
{d | \d int - TopEnv.RegisterCO(``june_sunset'', = CO.Funct( fn co => CO_Nat.mK(sunset(6,CO_Nat.Km(co),95))), = Type.Arrow(Type.Nat, Type.Nat));
Here we have used our system's complex object interface to translate the SML function sunset into a complex object function. The call to RegisterCO then makes this complex object function known to AQL as the primitive june sunset. We now enter the AQL top-level read-eval-print loop and de ne a macro which we will use to index into the NetCDF le: - AQL(); : macro \days_since_1_1 = :: let val \months = [[0,31,28,31,30,31,30,31,31,30,31,30]] :: in :: fn (\m,\d,\y) => :: d + summap(fn \i => months[i])!(gen!m) + :: if m>2 and y%4=0 then 1 else 0 :: end; typ days_since_1_1 : nat * nat * nat -> nat val days_since_1_1 = days_since_1_1 registered as macro.
11
This macro takes an input date and computes the number of days since the beginning of the year. (The AQL notation fn P => e de nes lambda abstraction, ! is the AQL notation for function application, P and summap(f)!e is the AQL notation for ff (x)jx 2 eg.) We next read the June data from the NetCDF le, using the macro we just de ned to compute the index range for time. We assume the latitude and longitude indices are provided by another index computing macro previously de ned for this netCDF le: :
val \lat = get_lat_index("New York");
typ lat : nat val lat = 23 :
val \lon = get_long_index("New York");
typ lon : nat val lon = 5 : readval \T using netCDF at (``temp.nc'', ``temp'', :: (days_since_1_1!(6,1,95)*24,lon,lat), :: (days_since_1_1!(6,30,95)*24,lon,lat)); typ T : [[real]]_3 val T = [[(0,0,0):67.3, (0,0,1):67.3, (0,0,2):67.2, ...]]
now contains hourly data for June at the given latitude and longitude. We nally execute our query, using the newly registered june sunset primitive:
T
:
{d | [\h:\t] june_sunset_hour(d), t > 85};
typ it : {nat} val it = {25,27,28}
That is, there were three days in June when the temperature went over 85 after sunset. Note that in addition to anonymous query evaluation, three types of declarations are supported at the AQL toplevel: macro declarations, which keep track of queries and can be used to de ne new primitives, val declarations, which keep track of complex objects and can be used to de ne persistent data, and read commands which read in data as values. Also note that our system provides two views of AQL. At the SML top-level, a user can make calls to any of our library routines. These routines provide support for customizing the AQL top-level environment to speci c application domains. At the AQL top-level, the user can enter AQL declarations and queries. Because SML has an interactive compiler, the user can go back and forth between these two views of the system.
4.2 General Architecture The general architecture of our system is shown in gure 3. It is based on the architecture of CPL/Kleisli, an open query system implementing NRC , see [5, 33]. Our system is divided into four main subsystems: a query module which manages query representation and compilation, an object module which supports query evaluation via a complex object library, an I/O module which provides 12
Remote Net
ML
Servers
C/PRL I/O Module SQL
Query Module
Object Module
Driver
Toke-
Parser
Object library
nizer
Typechecker
Query execution
Parser
Optimizer
Pipe
Oracle Driver
Object Parsing/ Pretty Printing I/O server
Environment Module
NetCDF Utilities
Top-level control and declarations SML/C Customization Facilities
Interface
Local
utilities
Disk HDF Utilities
Figure 3: The AQL System Architecture access to local and remote data, and an environment module which facilitates customization of the system to speci c application domains. Query Processing. When a query is executed, it produces a complex object as its result. The compilation of queries and their evaluation into complex objects is handled by the query and object modules. The details of this process are shown in gure 4. An AQL query is rst parsed into an internal representation of the surface syntax, AQLabst. After undergoing some simple syntactic checks, the query is translated into a second internal representation (NRCAabst). NRCAabst is just abstract syntax for our core calculus, so the translation consists of eliminating comprehensions, patterns (cf. gure 2), blocks and other syntactic sugar as described in the previous section. The core calculus query is now sent through a typechecker. Next, in preparation for optimization, any macros de ned in the top-level environment are substituted in. The query is now optimized and the resulting query is evaluated into a complex object. Query evaluation proceeds by translating core calculus constructs into calls to routines in a complex object library. The routines act on an abstract representation of complex objects that resembles our de nitions for literals. Once a result has been computed, this abstract representation of complex objects is translated to the surface syntax for complex objects via a pretty printer. Openness and the Top-Level Environment. Our system has an open architecture: new external functions, data readers/writers, and optimization rules can all be added dynamically to the AQL toplevel environment by calling appropriate registration routines provided in the environment module. Once registered, external functions and readers/writers are immediately available as new primitives within the AQL top-level read-eval-print loop, and rules/cost functions are immediately available to 13
Surface Query Syntax Lexer/Parser
local
external
persistent
persistent data
data AQLabst
Lex Checker
Translator NRCAabst
Type Checker
query
Macro Expander
Code generator
NRCAabst’
Optimizer
Code Generator
COabst Pretty Printer
Surface Object Syntax
complex object
Figure 4: Query Processing the optimizer. In the example given above, we showed how to add the new external function sunset. Since this function was written in SML, we could use the complex object interface to treat the SML function as a complex object function. We can inject primitives implemented in languages other than SML, by treating these functions as data readers. Data readers communicate their data by depositing it on the input stream using the data exchange format for complex objects we described earlier. Thus any function producing output in this manner can appear as a reader to the AQL input server. I/O and the NetCDF Interface. By providing a standard data exchange format for complex objects, we help make the system open. Any driver which produces a stream of bytes in this format can quickly be plugged into our system by registering it as a new reader. We have implemented a driver for NetCDF which produces a stream of bytes in the complex object data exchange format. Since NetCDF les are stored locally, it is possible to avoid the serialization of their data into a byte stream. We are currently investigating the possibility of adding another NetCDF driver which deposits its data directly into AQL complex objects.
5 Normalization Algorithm and Optimizations Optimizations and normalizations are performed at compile time on the abstract syntax representations of NRCA constructs, as shown in Figure 4. The optimizer proceeds in a number of phases. In each phase, a set of rules is applied. The rule bases, the rule application strategies, and the number of phases of this optimizer are extensible. We will discuss only the normalization phase of the optimizer. The normalization rules for sets, tuples, and conditionals come from the equational theory of NRC , described in [7, 34, 33]. They include rules for vertical and horizontal fusion of set loops, lter promotion, and column reduction [5]. The rules for summation and arithmetic come from an extension of NRC to arithmetic given in [17]. Here, we 14
describe the new rules for arrays. Since the syntax for arrays was inspired by viewing them as functions, it is not surprising that the rules for arrays are also based on this view of arrays as (partial) functions. There are three rules for arrays: ( ) ( ) ( ) p
p
p
[ e1 j i < e2 ] [e3 ] ; if e3 < e2 then e1 fi := e3 g else ? [ e[i] j i < len (e) ] ; e len ([[ e1 j i < e2 ] ) ; e2
The rst two are partial versions of the lambda calculus and transition rules, cf. [2]. The third rule corresponds to partial function domain extraction. In the context of arrays, the rst rule can be interpreted as saying that to compute the value of the array tabulated by [ e1 j i < e2 ] at index e3 , it suces to compute just the e3 th value of the array, after checking that e3 would have been within bounds. This rule saves both time and space by avoiding tabulation (i.e., materialization) of the intermediary array. The second rule can be interpreted as saying that the array tabulated from another array e by using all its values in order is just the array e. Once again this rule saves time and space by avoiding retabulation of the array. The third rule says that to get the length of the array tabulated by [ e1 j i < e2 ] , you don't need to tabulate this array, you only need to compute e2 . The three rules for arrays generalize to the k-dimensional case as follows: ( ) p
( ) ( ) p
p
[ e j i1 < e1 ; : : : ; i < e ] [e01 ; : : : ; e0 ] ; if e01 < e1 then if e0 < e then efi1 := e01 ; : : : ; i := e0 g else ? else ? [ e[i1 ; : : : ; i ] j i1 < dim 1 (e); : : : ; i < dim (e) ] ; e dim ([[ e j i1 < e1 ; : : : ; i < e ] ) ; (e1 ; : : : ; e ) k
k
k
k
k
;k
k
k
k
k
k
k
k;k
k
k
These rules seem quite obvious and simple, but are they enough for optimizing arrays? Recall that in section 3, we claimed that it was not necessary to add a primitive for transpose in order to capture the rule: transpose([[e j i < m; j < n ] ) ; [ e j j < n; i < m ] Assuming the de nition of transpose we gave in 2, we now show that this rule is derivable (up to redundant constraint checks), using only the rules we have given so far. In addition to the rules for arrays, our derivation uses two rules we inherit from NRC : for functions and for products. ( ) ( ) () ( ) p
p
=
; ; ; ;
? ?
transpose([[e j i < m; j < n ] ) (A:[ A[i0 ; j 0 ] j j 0 < 2 (dim 2 A); i0 < 1 (dim 2 A)]])([[e j i < m; j < n ] ) [ [ e j i < m; j < n] [i0 ; j 0 ] j j 0 < 2 (dim 2 [ e j i < m; j < n] ); i0 < 1 (dim 2 [ e j i < m; j < n] )]] [ [ e j i < m; j < n] [i0 ; j 0 ] j j 0 < 2 (m; n); i0 < 1 (m; n)]] [ [ e j i < m; j < n] [i0 ; j 0 ] j j 0 < n; i0 < m] [ if (i0 < m) then if (j 0 < n) then efi := i0 ; j := j 0 g else ? else ?j j 0 < n; i0 < m]
Observe that in the last expression, both if conditions must necessarily hold because they are only evaluated if i0 and j 0 are within bounds, i.e., if (j 0 < n) and (i0 < m). So these constraint checks are redundant. If we could get rid of these redundant constraint checks, then we would end up with the expression [ efi := i0 ; j := j 0 g j j 0 < n; i0 < m] which is just the right-hand-side of the transpose rule given above (up 15
to variable renaming). Similar reasoning shows that the identity zip (subslab; subslab) = subslab zip given in the introduction is derivable up to extra constraint checks and variable renaming. In general, the constraint checks introduced by the rule will be redundant as long as no bounds errors were present in the original code. The question arises whether all redundant constraint checks can be removed by further optimization rules. The answer is no, since as we observed earlier, bounds checking is undecidable. However, the following rules will eliminate some redundant checks:7 (bc 1) [ (: : : (i < e ) : : :) j i1 < e1 ; : : : ; i < e ] ; [ (: : : true : : :) j i1 < e1 ; : : : ; i < e ] (bc 2) if e then (: : : e : : :) else e0 ; if e then (: : : true : : :) else e0 (bc 3) ifS e then e0 else (: : : e : : :) ; if e Sthen e0 else (: : : false : : :) (bc 4) f(: : : i < e : : :) j i 2 gen (e)g ; f(: : : true : : :) j i 2 gen (e)g p
j
j
k
k
k
k
We have implemented normalization and constraint elimination as the rst two phases of our optimizer. Later phases include optimization for I/O and code motion.
6 Expressive power of array query language So far we have presented the array query language AQL based on the calculus NRCA that combines complex objects and multidimensional arrays. A natural question to ask is the following. How much expressiveness do we gain by adding arrays to the complex object language? In this section we give a precise answer to this question. We show that adding arrays amounts to adding the following to a pure relational query language: 1. A general template for producing aggregate functions. 2. A generator for initial intervals of natural numbers. While this gives us a precise answer to the question above, such a characterization of the expressive power is not very intuitive. In particular, it does not connect very well with arrays. So we shall provide an alternative characterization. Before presenting the technical details, we can state it informally: Adding arrays to a complex object language amounts to explicit adding of indices uniformly across sets, bags and lists. We need some new terminology rst. A language NRC aggr is de ned to be the fragment of NRCA P that contains all set operations, the arithmetic operations +; : ; , and the summation operator . As explained before, the arithmetic plus the summation operator allow us to express aggregates such as total and count. Using nesting, we can express groupby, which is another means of aggregation in SQL. This language can be viewed as a \theoretical reconstruction" of SQL. Indeed, it has both features that distinguish all implementations of SQL from purely relational languages, that is, groupby and aggregate functions. In fact, NRC aggr was used in [19] to study limitations of expressive power of SQL. Our rst result characterizes the expressive power of NRCA as that of NRC aggr (gen ) (we list extra primitives in parentheses). We also look at the class of queries from at relations (sets of tuples that do not involve sets) to at relations expressible in that language. 7
or (
These rules need some extra conditions guaranteeing free variables in ).
:::e:::
16
i j < ej
or are not captured in ( ( j e
:::
i
< ej
) ) :::
Theorem 6.1 The languages NRCA and NRC (gen ) have the same expressive power. Furthermore, NRC (gen ) is a conservative extension of its at fragment: any NRC (gen )-query from aggr
aggr
aggr
at relations to at relations can be expressed using just relational calculus, arithmetic operations, summation and gen. 2
Since the languages NRCA and NRC aggr (gen ) have dierent type systems, the above equivalence is modulo some translation between the type systems. For the nontrivial inclusion NRCA NRC aggr (gen ) it must translate away the arrays and errors. Here we just hint at how this translation works by showing a translation of NRCA objects into NRC aggr objects. For simplicity, we deal with pairs and not tuples and only one-dimensional arrays. The translation is as follows: x = fxg for x of base type fx1 ; : : : ; x g = fx1 ; : : : ; x g (x; y) = f(x ; y )g [ e0 ; : : : ; e ?1 ] = f((e0 ) ; 0); : : : ; ((e ?1 ) ; n ? 1)g ? = fg To prove the equivalence modulo these translations, we use the algebras of functions corresponding to the calculi we have presented. These algebras are derived from the calculi in the same manner as relational algebra is derived from relational calculus. The algebra of functions corresponding to NRC aggr has been presented in [18]. For NRCA we derive a similar algebra by adding a number of functions to handle the array operations. For example, there is a function mk arr(f ) : N ! [ t] , provided f is of type N ! t. Applied to a number n it yields [ f (i) j i < n] . Using the algebras of functions, we show that they can be translated into each other, and are thus equivalent modulo the translation above. To give a more intuitive characterization of the expressive power of NRCA, we follow the idea of [4], S and replace the construct fe1 j x 2 e2 g with n
n
n
n
?; x : s; i :SN ` e1 : ftg ? ` e2 : fsg ? ` i fe1 j x 2 e2 g : ftg i
that has the following semantics. Assume that e2 is a set fx1 ; : : : ; x g such that x1 < : : : < x that < is a linear ordering on objects of type s), and that f is the function (x; i):e2 . Then S(recall fi e1 j x 2 e2 g evaluates to f (x1; 1) [ : : : [ f (x ; n). The subscript \i" stands for index, which is bound together with the variable in this construct. S For example, sort(X ) = i ff(x; i)g j x 2 X g \sorts" a set: if X = fx1 ; : : : ; x g with x1 < : : : < x , then sort(X ) evaluates to f(x1 ; 1); : : : ; (x ; n)g. n
s
s
n
s
i
n
i
n
s
s
n
n
Theorem 6.2 The language obtained by adding the type of natural numbers, gen and the S fe j x 2 e g construct to NRC has the same expressive power as NRCA. 2 i
1
i
2
That is, the gain in expressiveness obtained by adding arrays to a complex object language is precisely characterized as adding indices in an explicit manner. Furthermore, we are going to show that the same result holds if we consider bag-based or list-based complex objects. First, we de ne analogs of the nested relational calculus NRC for bags and lists. These are called NBC and NLC . In the type system, the set type is replaced by the bag type and the list type respectively. We use fj jg as bag brackets and [ ] as list brackets. S The union operation is ] for bags (it adds up multiplicities) and append t for lists. Finally, the fe1 j x 2 e2 g construct is replaced by 17
Ufje j x 2 e jg for bags and F[e j x 2 e ] for lists. The semantics is the same as before except that 1
2
1
2
the operation ] is used for bags and t for lists. The language NBC and its extensions have been studied extensively in the past few years, see [12, 18, 19]. The language NLC can be used as a starting point for designing list query languages, but by itself it does not possess enough power. For example, head and tail of a list cannot be de ned in it. U F U Now we de ne the \indexed" analogs of the and operations. The rst one, ifje2 j x 2 e1 jg isF de ned in exactly the same way as the corresponding operation for Fsets. For lists we de ne i [if i = 1 then [ ] else [x] j i [e2 j x 2 e1 ] where x binds x at the ith position in the list. For example, x 2 L] is the tail of list L. U Now we let NBC iFstand for NBC augmented with i fje2 j x 2 e1 jg, and we let NLC i stand for NLC augmented with i[e2 j x 2 e1 ]. We do not add the type of natural numbers explicitly to those languages because the number n can be simulated as a bag/list of n identical elements. The following result, together with theorem 6.2, justi es our claim that adding arrays gives us the same extra power as adding indices unformly across sets, bags and lists. i
i
i
i
i
i
Theorem 6.3 Both NBC and NLC have the same expressive power as NRCA. i
i
2
7 Conclusions and future work Multidimensional arrays are needed for natural representations of many scienti c data types. However, multidimensional arrays are not well supported by commercial database systems or by theoretical database research. As a result, multidimensional scienti c data is usually kept in at les conforming to various data exchange formats such as NetCDF and is queried via a collection of in exible specialized library routines tied into some general purpose programming languages. In this paper, we aim to provide a database technology for exibly querying and transforming multidimensional arrays. We have developed a high-level comprehension style query language, AQL, for multidimensional arrays. We have also implemented a data driver to mediate between our query language and the popular NetCDF data exchange format for scienti c data. Thus our query language can be used to directly manipulate a large amount of \legacy" scienti c data. We have also derived many useful rules from the equational laws of AQL. We have implemented these rules in our AQL optimizer. Finally, we have investigated the expressive power of our query language for arrays. In particular, we have shown that its expressiveness corresponds to adding indices explicitly in a query language for complex objects | be they based on sets, bags, or lists. Some problems still remain. We list two of them below. Firstly, as mentioned earlier, we currently use stream-based I/O for external arrays. We would like to investigate techniques for providing more direct access to these arrays, perhaps through the use of good predictive caching. Secondly, AQL currently supports initial segments of natural numbers as array indices. We would like to investigate techniques for providing more meaningful data types such as longitudes and latitudes as indices for scienti c arrays. Eventually, we would like to allow arbitrary linearly-ordered types to be used as array indices. 18
References [1] Arvind, R.S. Nikhil, and K.K. Pingali. I-structures: Data structures for parallel computing. In J. Fasel and R. M. Keller, editors, Proceedings of the Graph Reduction Workshop. Springer-Verlag, 1987. [2] H. Barendregt. Lambda Calculus: Its Syntax and Semantics. North Holland, 1984. [3] C. Beeri and D.K.C. Chan. Bounded arrays: a bulk type perspective. Technical report, Hebrew Univ., 1995. [4] P. Buneman. The fast Fourier transform as a database query. Technical Report MS-CIS-93-37/L&C 60, University of Pennsylvania, March 1993. [5] P. Buneman, S. Davidson, K. Hart, C. Overton, L. Wong. A data transformation system for biological data sources. In VLDB'95, pages 158{169. [6] P. Buneman, L. Libkin, D. Suciu, V. Tannen, and L. Wong. Comprehension syntax. SIGMOD Record, 23(1):87{96, March 1994. [7] P. Buneman, S. Naqvi, V. Tannen and L. Wong. Principles of programming with complex objects and collection types. Theoretical Computer Science, 149(1):3{48, September 1995. [8] R.G.G. Cattell, ed. The Object Database Standard: ODMG-93. Morgan-Kaufmann, 1994. [9] L. Fegaras and D. Maier. Towards an eective calculus for object query languages. In Proceedings of ACM SIGMOD International Conference on Management of Data, pages 47{58, 1995. [10] J. Feo. Arrays in Sisal. In Proc. Workshop on Arrays, Functional Languages and Parallel Systems, L. Mullin et.al. eds., Kluwer Academic Publishers, 1990. [11] S. Greco, P. Palopoli and E. Spadafora. DatalogA : Array manipulations in a deductive database language. In Proc. 4th Conf. on Database Systems for Advanced Applications, pages 180{188, 1995. [12] S. Grumbach and T. Milo. Towards tractable algebras for bags. In PODS'93, pages 49{58. [13] P. Hammarlund and B. Lisper. On the relation between functional and data parallel programming languages. In Proceedings of the ACM Conference on Functional Programming Languages and Computer Architecture, pages 210{219, June 1993. [14] P. Hudak and P. Wadler. Report on the programming language Haskell. Technical Report, Glasgow University, Glasgow G12 8QQ, Scotland, April 1990. [15] K. E. Iverson. A Programming Language. Wiley, New York, 1962. [16] T. W. Leung, G. Mitchell, B. Subramaniam, B. Vance, S. Vandenberg, and S. B. Zdonik. The AQUA data model and algebra. In Proc. 4th Int. Workshop on Database Programming Languages, September 1993, Springer Verlag, 1994. [17] L. Libkin and L. Wong. Aggregate functions, conservative extensions, and linear orders. In Proc. 4th Int. Workshop on Database Programming Languages, September 1993, Springer Verlag, 1994. [18] L. Libkin and L. Wong. Some properties of query languages for bags. In Proc. 4th Int. Workshop on Database Programming Languages, September 1993, Springer Verlag, 1994. [19] L. Libkin and L. Wong. New techniques for studying set languages, bag languages and aggregate functions. In PODS'94, pages 155-166. [20] L. Libkin and L. Wong. Conservativity of nested relational calculi with internal generic functions. Information Processing Letters, 49(6):273{280, March 1994. [21] D. Maier and B. Vance. A call to order. In Proceedings of 12th ACM Symposium on Principles of Database Systems, pages 1{16, Washington, D. C., May 1993. [22] D. Maier and D. Hansen. Bambi meets Godzilla: Object databases for scienti c computing. In Proc. 7th Working Conference on Scienti c and Statistical Database Management, 1994, pages 176{184. [23] G. Mecca and A. Bonner. Sequences, datalog and transducers. In PODS'95, pages 23{35.
19
[24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34]
G. Mecca and A. Bonner. Finite query languages for sequence databases. In DBPL'95, to appear. R. Milner, M. Tofte, R. Harper. \The De nition of Standard ML". The MIT Press, Cambridge, Mass, 1990. T. More. Axioms and theorems for a theory of arrays. IBM J. Res. and Development 17 (1973), 135{175. R. Rew, G. Davis and S. Emmerson. NetCDF User's Guide, Unidata Program Center, 1993. P. Seshadri, M. Livny and R.Ramakrishnan. Sequence query processing. In SIGMOD'94. P. Seshadri, M. Livny and R.Ramakrishnan. : a model for sequence databases. Technical report, Univ. of Wisconsin, 1994. S. Vandenberg. Algebras for Object-Oriented Query Languages. PhD thesis, Univ. of Wisconsin, 1993. S. Vandenberg and D. DeWitt. Algebraic support for complex objects with arrays, identity and inheritance. In SIGMOD'91, pages 158{167. P. Wadler. Comprehending monads. In Proc. ACM Conf. on Lisp and Functional Progr., Nice, June 1990. L. Wong. Querying Nested Collections. PhD thesis, University of Pennsylvania, August 1994. L. Wong. Normal forms and conservative extension properties for query languages over collection types. JCSS, to appear. Extended abstract in PODS'93. SEQ
20