Reasoning with Aggregation Constraints in Views Shaul Dar
H. V. Jagadish
AT&T Bell Laboratories
[email protected]
AT&T Bell Laboratories
[email protected]
AT&T Bell Laboratories
[email protected]
AT&T Bell Laboratories
[email protected]
Alon Y. Levy
Divesh Srivastava
Abstract
We investigate the problem of using materialized views to compute answers to SQL queries with grouping and aggregation, in the presence of multiset tables. This problem is important in many applications, such as data warehousing, mobile computing, global information systems, and maintaining physical data independence, where access to local or cached materialized views may be cheaper than access to the underlying database. In addition, this problem has obvious potential in optimizing query evaluation. The problem is formally stated as nding a rewriting of an SQL query Q where the materialized views occur in the FROM clause, and the rewritten query is multiset-equivalent to Q. First, we study the case where the query has grouping and aggregation but the views do not, and show that usability of a view in evaluating a query essentially requires an isomorphism between the view and a portion of the query. We present a rewriting algorithm that generates all possible rewritings of the query using the views (for the case of equality predicates); when using multiple views, considering the views iteratively in any order yields all possible rewritings. Second, we study the case where the query and the views both have grouping and aggregation, identify the conditions under which the aggregation information present in a view is sucient to perform the aggregate computations required in the query, and present a rewriting algorithm. Third, we outline how our techniques can be extended to take advantage of set-valued queries and views in the presence of keys or SELECT DISTINCT.
Keywords: Aggregation, materialized views, query optimization, data warehousing, mobile computing
1 Introduction We investigate the problem of using materialized views to compute answers to SQL queries with grouping and aggregation. This problem has the potential of improving the performance of SQL query evaluation in general. It may have even greater impact on the optimization of queries in applications such as data warehousing, mobile computing, global information systems, and maintaining physical data independence, where access to (local or cached) materialized views may be cheaper than access to the underlying database. For example, in mobile computing applications [BI94, HSW94] the database relations may be stored on a server and be accessible only via low bandwidth wireless communication, which may 1
additionally become unavailable. In globally distributed information systems [LSK95] the relations may be distributed or replicated, and locating as well as accessing them may be expensive. Locally cached materialized views of the data, such as the results of previous queries, may improve the performance of such applications. Data warehousing [GJM96, ZGMHW95], view adaptation [GMR95] and physical data independence [TSI94] involve the usage of materialized views (conceptual relations) to integrate data from multiple sources or to hide schema dierences or physical storage details. The underlying relations may be expensive to access or may not be available, requiring queries against these relations to be processed using the materialized views. Finally, in very large transaction recording systems [Dat94, JMS95] the volume of incoming data and the size of the database may be very large. Queries against this data typically involve aggregation. Such queries may be answered more eciently by materializing and maintaining appropriately de ned aggregate views (summary tables), which are much smaller then the underlying data and can be cached in faster memory. We formalize the problem of using materialized SQL views to answer SQL queries as nding a rewriting of a query Q where the materialized views occur in the FROM clause, and the rewritten query is multiset-equivalent to Q. The main technical challenges arise from the multiset semantics of SQL, in conjunction with the use of grouping and aggregation. We study SQL queries and views of the form \SELECT-FROM-WHERE-GROUPBY-HAVING", where the SELECT clause may contain one of the SQL aggregate functions MIN, MAX, SUM or COUNT. We focus on single block queries and views1 , and do not assume the availability of any meta-information about the database schema, such as keys or functional dependencies. Three cases can be distinguished: (1) the query has grouping and aggregation, but the views do not; (2) both the query and the views have grouping and aggregation; and (3) the views have grouping and aggregation, but the query does not. The contributions of this paper are developed in a step-wise fashion, as follows:
1
First, in Section 3, we study the case where the query has grouping and aggregation but the views do not. We show that usability of a view in evaluating a query essentially requires an isomorphism between the view and a portion of the query. These usability conditions are also applicable to the case when the query does not have grouping and aggregation. We provide a rewriting algorithm that generates all possible rewritings of the query using the views, when only equality predicates are used. Furthermore, the algorithm can be iteratively applied to incorporate multiple views | considering the views in any order yields all possible rewritings. Second, in Section 4, we study the case where the query and the views both have grouping and aggregation. We extend the conditions for usability in a natural fashion to recognize when the aggregation information present in a view is sucient to perform the aggregate computations required in the query, and provide a rewriting algorithm for the query. For the case where the views have grouping and aggregation but the query does not, we show in Section 4.5 that it is not possible in general to use the views to evaluate the query. Third, in Section 5, we outline some simpli cations to the usability conditions that utilize information about set-valued queries and views, such as the existence of keys or SELECT DISTINCT.
In particular, we assume that there are no view tables in the FROM clause of the query or the view.
2
There has been previous work on using views to answer queries (e.g., [YL87, CKPS95, SJGP90, TSI94, CR94]), but the formal aspects of nding the equivalent rewritings for SQL queries with multiset semantics, grouping and aggregation, have received little attention. Most of the work has focused on conjunctive queries (i.e., without grouping and aggregation) with set semantics. One exception is the work by [CKPS95], who studied the optimization of conjunctive SQL queries using conjunctive SQL views. Recently, [GHQ95] independently considered the problem of using materialized aggregation views to answer aggregation queries; their approach uses transformation rules on the query tree representations, does not provide any formal guarantees of completeness, and does not deal with many of the cases covered by our approach (such as Example 1.1). A restricted problem is studied in [GMR95], who assume that a materialized view may be rede ned, and investigate how to adapt the materialization of the view to re ect the rede nition. A comparison with [CKPS95, GHQ95, GMR95] and other related work in the literature is presented in Section 6.
1.1 Motivating Example We present an example from data warehousing in a telephony application to illustrate the potential performance gains when using materialized aggregation views to answer queries.
Example 1.1 (Motivating Example)
Consider a data warehouse that holds information useful to a telephone company. The database maintains the following tables:
Customer(Cust Id; Cust Name; Area Code; Phone Number), which maintains information about individual customers, Calling Plans(Plan Id; Plan Name), which maintains information about the dierent calling plans, and Calls(Call Id; Cust Id; Plan Id; Day; Month; Y ear; Charge), which maintains informationof each individual call.
The underlined column names are keys of the respective tables. Assume that the telephone company is interested in determining calling plans that have earned less than a million dollars in 1995. The following query Q may be used for this purpose: Q: SELECT Calling Plans:Plan Id; Plan Name, SUM(Charge) FROM Calls; Calling Plans WHERE Calls:Plan Id = Calling Plans:Plan Id AND Y ear = 1995 GROUPBY Calling Plans:Plan Id; Plan Name HAVING SUM(Charge) < 1000000
The telephone company also maintains materialized views that summarize the performance of each of their calling plans on a periodical basis. In particular assume that the following materialized view V1 (Plan Id; Plan Name; Month; Y ear; Monthly Earnings) is available: V1 : SELECT Calls:Plan Id;Plan Name; Month;Y ear, SUM(Charge) FROM Calls; Calling Plans WHERE Calls:Plan Id = Calling Plans:Plan Id GROUPBY Calls:Plan Id; Plan Name; Month; Y ear
3
View V1 can be used to evaluate the query Q by collapsing multiple groups in V1 corresponding to the monthly plan earnings into annual plan earnings, and enforcing the additional conditions to get the 1995 summaries of plans earning less than a million dollars. The rewritten query Q0 that uses V1 is: Q : SELECT Plan Id;Plan Name, SUM(Monthly Earnings) FROM V1 WHERE Y ear = 1995 GROUPBY Plan Id;Plan Name HAVING SUM(Monthly Earnings) < 1000000 0
Note that the Calls table can be huge, and the materialized view V1 is likely to be orders of magnitude smaller than the Calls table. Hence, evaluating Q0 will be much more ecient than evaluating Q, emphasizing the importance of recognizing that Q can be rewritten to use the materialized view V1. 2
2 Notation and De nitions We consider SQL queries and views with grouping and aggregation. A view is de ned by a query, and the name of the view is associated with the result of the query. Queries and views have the form: Q: SELECT Sel(Q) FROM R1 (A1 ); : : : ; Rn (An ) WHERE Conds(Q) GROUPBY Groups(Q) HAVING GConds(Q)
For notational convenience, we modify the naming convention of standard SQL to guarantee
unique column names for each of the columns in a query. For example, let R and S be two tables
each with a single column named A. If a query Q has both R and S in its FROM clause, our notation would replace them by R(A1) and S(A2 ). Every reference to R:A in Q is replaced by A1, and every reference to S:A in Q is replaced by A2 . Similarly, if a query Q has two range variables R1 and R2 ranging over table R in its FROM clause, our notation would replace them by R(A1 ) and R(A2 ). Every reference to R1:A in Q is replaced by A1 , and every reference to R2:A in Q is replaced by A2. We use Tables(Q) to denote the set of tables fR1(A1 ); : : :; Rn(An )g in the FROM clause of Q, and Cols(Q) to denote A1 [ : : : [ An , i.e., the set of columns of tables in Tables(Q). The list of columns in the SELECT clause of Q, denoted by Sel(Q), consists of: (a) nonaggregation columns: this is a subset of the columns in Cols(Q), and is denoted by ColSel(Q); and (b) aggregation columns: these are of the form AGG(Y ), where Y is in Cols(Q) and AGG is one of the aggregate functions MIN, MAX, SUM, COUNT and AVG. The set of columns that are aggregated upon, such as Y above, is a subset of Cols(Q), and is denoted by AggSel(Q). The grouping columns of query Q, denoted by Groups(Q), consists of a subset of the columns in Cols(Q). SQL requires that if Groups(Q) is not empty, then ColSel(Q) must be a subset of Groups(Q). 4
The conditions in the WHERE clause of query Q, denoted by Conds(Q), consists of a boolean combination of built-in predicates, whose arguments are columns in Cols(Q) or constants. The conditions in the HAVING clause of query Q, denoted by GConds(Q), consists of a boolean combination of built-in predicates whose arguments are columns in Groups(Q), aggregation columns of the form AGG(Y ) where Y is in Cols(Q), and constants. For simplicity of exposition, we restrict the conditions in the WHERE and HAVING clauses to conjunctions of equality and inequality predicates of the form A op B, where A and B are columns of tables, aggregation columns or constants, and op is one of fg. Our results can be naturally extended to incorporate more general built-in predicates, e.g., those involving the arithmetic operations + and . Given a query Q, if Groups(Q); AggSel(Q) and GConds(Q) are empty, then Q is referred to as a conjunctive query. Otherwise, Q is referred to as an aggregation query. Determining that view V is usable in evaluating query Q requires (as we show later in the paper) that we consider mappings from V to Q. These are speci ed by column mappings, de ned below.
De nition 2.1 (Column Mapping) A column mapping from query Q1 to query Q2 is a mapping from Cols(Q1 ) to Cols(Q2 ) such that if R(A1 ; : : :; An) is a table in Tables(Q1 ), then: (1) there exists a table R(B1 ; : : :; Bn) in Tables(Q2 ), and (2) Bi = (Ai ); 1 i n. A 1-1 column mapping is a column mapping from Q1 to Q2 such that distinct tables in Tables(Q1 ) are mapped to distinct tables in Tables(Q2 ). Otherwise, the column mapping is a many-to-1 column mapping. 2 As a shorthand, if R is a table in Tables(Q1 ), we use (R(A1 ; : : :; An)) to denote R((A1 ); : : :; (An )), where A1 ; : : :; An are columns in Cols(Q1 ). We use similar shorthand notation for mapping query results, sets and lists of columns, sets of tables, and conditions.
De nition 2.2 (Rewriting of a query) A query Q0 is a rewriting of Q that uses view V if: 1. Q and Q0 are multiset-equivalent, i.e., they compute the same multiset of answers for any given database, and 2. Q0 contains one or more occurrences of V in its FROM clause.
2 In the sequel, we say that view V is usable in evaluating query Q, if there exists query Q0 such that Q0 is a rewriting of Q that uses V .
3 Aggregation Queries and Conjunctive Views In this section we consider the problem of using materialized views without grouping and aggregation to evaluate a query with grouping and aggregation. Intuitively, if a materialized view V is usable in evaluating a query Q, then V must \replace" some of the tables and conditions enforced in Q; other tables and conditions from Q must remain in the rewritten query Q0 . Consequently, for view V to be usable in answering query Q, it must be the case that: 5
The view V does not project out any columns needed by the query Q. Intuitively, a column A is needed by Q if it appears in the result of Q or if Q needs to enforce a condition involving A that has not been enforced in the computation of V . The view V does not discard any tuples needed by the query Q. Intuitively, a tuple is needed by Q if it satis es the conditions enforced in Q.
We formalize these intuitions below, show that they yield both necessary and sucient conditions, and present an algorithm to rewrite Q using V . We rst examine the case when the query does not have a HAVING clause, and then outline the eect of the HAVING clause on the conditions for usability and the rewriting algorithm.
3.1 Aggregation Queries Without a HAVING Clause Conditions for Usability
The conditions for usability of a view V in evaluating query Q are expressed formally below in terms of column mappings. Note that the conditions apply also to the restricted case when both the view and the query are conjunctive.
Condition C1 : There is a 1-1 column mapping from V to Q. Condition C2 : If a column A in ColSel(Q) [ Groups(Q) is a column in (Cols(V )), then Sel(V ) must have a column BA such that:
Conds(Q) implies (A = (BA )) Note that this condition is satis ed if BA = ?1 (A). Condition C3 : There exists a boolean combination of built-in predicates, Conds0, such that: Conds(Q) is equivalent to (Conds(V )) & Conds0, and Conds0 involves only the columns in Cols(Q) that are not in (Cols(V )) (i.e., columns of tables in Tables(Q) that are not in the image of the mapping ), and columns in (Sel(V )). Condition C4 : Suppose AGG(A) is in Sel(Q). If column A is in (Cols(V )), then: 1. If AGG is MIN, MAX or SUM, then there must exist a column BA in Cols(V ) such that Conds(Q) implies (A = (BA )), and Sel(V ) contains the column BA . 2. If AGG is COUNT, then Sel(V ) must not be empty. Condition C1 and the rst part of condition C3 essentially guarantee that the view is multiset equivalent to its image under ; similar conditions for conjunctive queries and conjunctive views were given in [CV93]. The second part of condition C3 ensures that constraints that are not enforced in the view can still be enforced in the query when the view is used, since they do not refer to columns that are projected out in the view and hence are no longer available. Conditions C2 and C4 ensure that the view does not project out any column that is required in the SELECT clause of the query. 6
Given a mapping satisfying condition C1, conditions C2; C3 and C4 can be veri ed by comparing the closures of Conds(Q) and (Conds(V )). 2
Rewriting Algorithm If conditions C1 { C4 are satis ed, the rewritten query Q0 is obtained from Q by replacing the tables in (Tables(V )) by (V ) in the FROM clause, where (V ) denotes V ((Sel(V ))). The SELECT and WHERE clauses of the query are then modi ed to re ect the use of view V . Formally, the rewritten query Q0 is obtained from Q by applying the following steps.
Step S1 : Replace all the tables in (Tables(V )) by (V ). Step S2 : Replace each column A in Groups(Q) [ ColSel(Q) [ AggSel(Q) by (BA ), where BA
satis es conditions C2 and C4, part 1. In the remaining steps of this algorithm, Groups(Q); ColSel(Q) and AggSel(Q) refer to these new column names. Step S3 : Determine a boolean combination of built-in predicates Conds0 satisfying condition C3 as above. Replace Conds(Q) in Q by Conds0. Step S4 : Consider an aggregation column COUNT(A) in Sel(Q) such that A is in (Cols(V )), but not in (Sel(V )). Replace COUNT(A) by COUNT(B), where B is any column in (V ). The following theorem shows that conditions C1 { C4 are necessary (when only equality predicates are used) and sucient conditions for usability of V in evaluating Q:
Theorem 3.1 Let Q be a single block aggregation query, and let V be a single block conjunctive view, such that Conds(Q) and Conds(V ) contain only equality and inequality predicates of the form A op B , where A and B are column names, or constants, and op is one of fg. If conditions C1 { C4 are satis ed, view V is usable in evaluating Q. In that case Q0 , obtained by applying steps S1 { S4 , is a rewriting of Q using V . If Conds(Q) and Conds(V ) contain only equality predicates, the view V is usable in evaluating Q only if conditions C1 { C4 are satis ed. 2 The following example illustrates the above conditions and the algorithm for obtaining the rewritten query.
Example 3.1 Let R1(A; B) and R2 (C; D) be two tables. Consider the query Q below: Q: SELECT A1 , SUM(B1 ) FROM R1 (A1 ; B1 ); R2 (C1 ; D1) WHERE A1 = C1 AND B1 = 6 AND D1 = 6 GROUPBY A1
Let the view V1 be: 2 The closure of a conjunction of built-in predicates, ( ), is the conjunction of all built-in predicates that are entailed by ( ). For the case of equality and inequality predicates of the form that we consider, the closure of ( ) has size polynomial in the size of ( ). Conds Q
Conds Q
Conds Q
Conds Q
7
V1 : SELECT C2 ; D2 FROM R1 (A2 ; B2 ); R2 (C2 ; D2) WHERE A2 = C2 AND B2 = D2
View V1 can be used to evaluate query Q since conditions C1 { C4 are satis ed.
Condition C1 : The 1-1 column mapping from V1 to Q is fA2 ! A1; B2 ! B1; C2 ! C1; D2 ! D1 g.
Condition C2 : For column A1, BA1 is the column C2 in Sel(V1 ), since it is the case that
Conds(Q) implies(A1 = C1). Condition C3 : Conds0 is given by D1 = 6, since (A1 = C1&B1 = 6&D1 = 6) is equivalent to ((A1 = C1&B1 = D1 )&D1 = 6). Condition C4 : For column B1, BB1 is the column D2 in Sel(V1 ). The rewriting of query Q that uses view V1 is: Q : SELECT C1 , SUM(D1 ) FROM V1 (C1 ; D1 ) WHERE D1 = 6 GROUPBY C1 0
2
3.2 Multiple Uses of Views Often, a query can make use of multiple materialized views, or the same materialized view multiple times. The rewriting algorithm presented above can also be used to incorporate multiple uses of materialized views. To obtain rewritings with multiple views we can create successive rewritings Q01; : : :; Q0n where each rewriting is obtained from the previous one by testing conditions C1 { C4, and applying the rewriting procedure described above. At each successive rewriting, the materialized views incorporated in previous rewritings are treated as database tables, rather than being expanded using their view de nitions. The following theorem states important properties of this iterative procedure.
Theorem 3.2 Let Q be a single block aggregation query, and let V1; : : :; Vn be single block conjunctive views, such that Conds(Q); Conds(V1); : : :; Conds(Vn) contain only equality and inequality predicates of the form A op B , where A and B are column names, or constants, and op is one of fg. Then the following hold: 1. An iterative application of the rewriting procedure consisting of steps S1 { S4 is sound, i.e., each successive rewriting is multiset-equivalent to Q. 2. The rewriting procedure has the Church-Rosser property, i.e., the result of rewriting query Q to incorporate views Vi1 ; : : :; Vij would be the same regardless of the order in which the views are considered.
8
3. If only equality predicates appear in Conds(Q); Conds(V1); : : :; Conds(Vn), then the iterative application procedure is complete, i.e., any rewriting of Q that uses one or more of V1 ; : : :; Vn can be obtained by iteratively applying the rewriting procedure.
2 It is important to note that, for the case of equality predicates, this procedure guarantees that we nd all ways of using the views to answer a query Q. In contrast, this property does not hold under the set semantics considered in [LMSS95], where there may exist rewritings that cannot be found by considering sequences of single view substitutions.
3.3 Aggregation Queries With a HAVING Clause Intuitively, when the query Q has a HAVING clause, the conditions for usability of a view V in evaluating Q and the rewriting algorithm need to be extended to account for:
Conditions in GConds(Q) that must be satis ed by the query, in addition to the conditions in Conds(Q), and Aggregation columns, of the form AGG(Y ), that occur in GConds(Q), but not in Sel(Q).
When query Q has a HAVING clause, the conditions in the HAVING clause may allow us to
strengthen the conditions in the WHERE clause of Q, without aecting the result of the query. For
example, if GConds(Q) is a conjunction of predicates, one of which is A > 5, where A is a column in Groups(Q), then the predicate A > 5 can be conjoined to the predicates in Conds(Q). As another example, if GConds(Q) is a conjunction of predicates, one of which is MAX(B) > 10, and MAX(B) is the only aggregation column appearing in Sel(Q), then the predicate B > 10 can be conjoined to the predicates in Conds(Q). In these two examples, the predicates A > 5 and MAX(B) > 10 can also be removed from GConds(Q). Several authors (e.g., [LMS94, RSSS95, LMS96]) have considered the problem of inferring conditions that can be conjoined to Conds(Q) given the conditions in GConds(Q), and removing redundant conditions in GConds(Q). These techniques can be applied to rewrite the query Q, as a pre-processing step yielding possibly modi ed conditions Conds(Q) and GConds(Q). The modi ed Conds(Q) is then used in checking conditions C2 { C4. Note that performing this pre-processing step, strengthening Conds(Q), potentially allows us to detect usability of views that would otherwise not be determined to be usable. In addition, since conditions in the HAVING clause of Q may contain aggregation columns of the form AGG(A) that do not occur in Sel(Q), condition C4 needs to be extended to take such aggregation columns into account as well, if column A is in (Cols(V )). The rewriting algorithm for the case of conjunctive views and aggregation queries with a HAVING clause is obtained by incorporating the above two extensions to the conditions for usability. Speci cally, step S3 of the modi ed algorithm uses the strengthened Conds(Q), described above, to determine Conds0. Also, step S3 replaces the conditions GConds(Q) in the HAVING clause of Q (after the pre-processing step) by conditions GConds0 in Q0, obtained by replacing column names in the obvious manner. In addition, step S4 of the modi ed rewriting algorithm considers aggregation columns of the form COUNT(A) in GConds(Q) in addition to aggregation columns of the form COUNT(A) in Sel(Q). 9
4 Aggregation Queries and Views In this section we consider the problem of using views when both the views and the query have grouping and aggregation. Recall that the two intuitive requirements for the usability of a conjunctive view V in answering a query Q (described in the beginning of Section 3) are that V not project out columns needed in Q, and that V not discard tuples needed in Q. In the presence of grouping and aggregation in the view, these requirements become more subtle:
An aggregation over a column in view V can be thought of as though that column was partially projected out, since V contains just aggregate values over that column, and not the original column values themselves. A groupby in view V results in the multiplicities of the tuples being lost.
However, as the following examples illustrate, in some cases it is possible to overcome the diculties introduced by grouping and aggregation in the view.
4.1 Illustrative Examples The following example illustrates that the aggregate information in a view may be sucient to compute the aggregate information needed in the query.
Example 4.1 (Coalescing Subgroups)
Consider the following example. Let R1(A; B; C; D) and R2(E; F) be two tables. Consider the query Q below: Q: SELECT A1 ; E1 , COUNT(B1) FROM R1 (A1 ; B1 ; C1 ; D1 ); R2 (E1 ; F1 ) WHERE C1 = F1 AND B1 = D1 GROUPBY A1 ; E1
Let the view V1 be: V1 : SELECT A2 ; C2 , COUNT(D2) FROM R1 (A2 ; B2 ; C2 ; D2 ) WHERE B2 = D2 GROUPBY A2 ; C2
The view V1 groups the table R1 by the A and C columns, and computes the multiplicity of each such group. The query Q, on the other hand, groups the table R1 only on the A column, resulting in more coarse groups than those computed in V1 . However, the counts of the A groups in Q can be computed by summing the counts of the (A; C) groups in V1, as illustrated in the following rewritten query. Q : SELECT A1 ; E1 , SUM(N1 ) FROM V1 (A1 ; C1 ; N1 ); R2 (E1 ; F1 ) WHERE C1 = F1 GROUPBY A1 ; E1 0
10
2 The following example illustrates that the existence of other columns in the view may enable us to recover the tuple multiplicities lost because of grouping in the view.
Example 4.2 (Recovery of Lost Multiplicities)
Let R1 (A; B; C; D) and R2(E; F) be two tables. Consider the query Q below: Q: SELECT A1 , SUM(E1 ) FROM R1 (A1 ; B1 ; C1 ; D1 ); R2 (E1 ; F1 ) GROUPBY A1
Let the view V1 be: V1 : SELECT A2 ; B2 , SUM(C2 ) FROM R1 (A2 ; B2 ; C2 ; D2 ) GROUPBY A2 ; B2
Although V1 does not project out any column that is needed in the query, V1 cannot be used to evaluate the query Q. This is because the multiplicity of the A column of R1 is needed in order to compute SUM(E1 ), but that multiplicity is lost in the view. However, consider the view V2 below, which retains also the count of the C column of R1. V2 : SELECT A3 ; B3 , SUM(C3 ), COUNT(C3) FROM R1 (A3 ; B3 ; C3 ; D3 ) GROUPBY A3 ; B3
Although the multiplicities of the A column are not explicit in V2 , they can be computed as follows. Va : SELECT A1 , SUM(N1 ) FROM V2 (A1 ; B1 ; S1 ; N1 ) GROUPBY A1
We can therefore use V2 to answer the query Q as follows. Q : SELECT A1 , Cnt V a SUM(E1 ) FROM V2 (A1 ; B1 ; S1 ; N1 ); V a(A4 ; Cnt V a); R2 (E1 ; F1 ) WHERE A1 = A4 GROUPBY A1 ; Cnt V a 0
2 As the examples illustrate, to use views that involve aggregations, we need to verify that the aggregate information in the view is sucient to compute the aggregates needed in the query, and that the correct multiplicities exist or can be computed. We formalize these intuitions below, present conditions for usability, and provide an algorithm to rewrite Q using V . We rst examine the case when the query and view do not have a HAVING clause, and then outline the eect of the HAVING clause on the conditions for usability and the rewriting algorithm. 11
4.2 Aggregation Queries and Views Without HAVING Clauses
Conditions for Usability
To specify conditions for usability for views with grouping and aggregation, we need to slightly modify conditions C2 and C3, and to substantially modify condition C4 to deal with the dierent cases of aggregates appearing in the SELECT clause of the query. The modi ed conditions are as follows:
Condition C20 : If a column A in Groups(Q) is a column in (Cols(V )), then ColSel(V ) (i.e., the non-aggregation columns in the SELECT clause of V ) must have a column BA such that: Conds(Q) implies (A = (BA ))
Condition C30 : There exists Conds0 , such that:
Conds(Q) is equivalent to (Conds(V )) & Conds0, and Conds0 involves only the columns in Cols(Q) that are not in (Cols(V )), and the columns in (ColSel(V )). Note that since ColSel(Q) must be a subset of Groups(Q), condition C20 is a generalization of condition C2. Also note that the second part of condition C30 does not allow Conds0 to constrain any of the columns in (AggSel(V )). Intuitively, this is because the columns in AggSel(V ) are aggregated upon in view V , and hence are not \available" for imposition of additional constraints in the rewritten query Q0. Condition C40 : Suppose AGG(A) is in Sel(Q). 1. If column A is in (Cols(V )), then: (a) If AGG is MIN, MAX or SUM, then there must exist a column BA in Cols(V ) such that Conds(Q) implies (A = (BA )), and Sel(V ) contains either the non-aggregation column BA , or an aggregation column of the form AGG(BA ). (b) If AGG is SUM or COUNT, then Sel(V ) must include a column of the form COUNT(A0 ), where A0 is a column in Cols(V ). 2. If column A is not in (Cols(V )), and AGG is either SUM or COUNT, then Sel(V ) must include a column of the form COUNT(A0), where A0 is a column in Cols(V ). Intuitively, condition C40 guarantees that the columns in the view contain enough information to compute the aggregates required in the query. In particular, conditions C40 , part 1(b) and part 2 guarantee that we can recover the multiplicities in the view in order to perform an aggregation that depends on such multiplicities (i.e., either SUM or COUNT). The two parts of the condition cover the cases when the aggregation is on a column mapped by the view, or not mapped by the view, respectively.
Conditions C20 { C40 can be checked, with minor modi cations, using the techniques for checking conditions C2 { C4 .
12
Rewriting Algorithm If conditions C1; C20 { C40 are satis ed, the rewritten query Q0 is obtained from Q by applying the following steps. Steps S10 { S30 are similar to steps S1 { S3 above. Steps S40 and S50 enumerate the various kinds of aggregation that may occur in the view and the query.
Step S10 : Replace all the tables in (Tables(V )) by (V ), where (V ) is de ned as follows: for each
non-aggregation column A in Sel(V ), (V ) contains the column (A); for each aggregation column A in Sel(V ), (V ) contains a new column name. Step S20 : Replace each column A in Groups(Q) [ ColSel(Q) [ AggSel(Q) by (BA ), where BA satis es conditions C20 and C40 , part 1(a). In the remaining steps of this algorithm, Groups(Q); ColSel(Q) and AggSel(Q) refer to these new column names. Step S30 : Determine a boolean combination of built-in predicates Conds0 satisfying condition C30 as above. Replace Conds(Q) in Q by Conds0. Step S40 : Consider an aggregation column AGG(A) in Sel(Q) such that A is in (Cols(V )). 1. Let AGG be MIN, MAX or SUM. By condition C40 , part 1, there are two cases to consider. (a) Suppose Sel(V ) contains the aggregation column AGG(?1 (A)). Let S denote the corresponding column in (V ). Replace AGG(A) in Sel(Q) by AGG(S). (b) Suppose Sel(V ) contains the non-aggregation column ?1(A). If AGG is either MIN or MAX, leave AGG(A) in Sel(Q) unchanged. If AGG is SUM, then by condition C40 , part 1(b), Sel(V ) must include a column of the form COUNT(A0 ). Let N denote the corresponding column in (V ). Let QV Groups = (Groups(V )) \ (Groups(Q) [ A), i.e., QV Groups contains A along with the grouping columns in Q that are also grouped in V . By condition C20 , columns in QV Groups must also be in (V ). Construct a new view V a as follows: V a: SELECT QV Groups, A SUM(N ) FROM (V ) GROUPBY QV Groups
Sum V a) to the tables in Tables(Q), where Sum V a is a new colAdd V a(G; umn name, and G is a list of n new column names, where n is the number of columns in QV Groups. Add n equality predicates equating corresponding columns in QV Groups and G to the WHERE clause of Q. Finally, replace SUM(A) in Sel(Q) by SUM(Sum V a). 2. Let AGG be COUNT. By condition C40 , part 1(b), Sel(V ) must include a column of the form COUNT(A0 ). Let N denote the corresponding column in (V ). Replace COUNT(A) in Sel(Q) by SUM(N). Step S50 : Consider an aggregation column AGG(A) in Sel(Q) such that column A is not in (Cols(V )). If AGG is MIN or MAX, leave AGG(A) unchanged. If AGG is SUM or COUNT, do the following. By condition C40 , part 2, Sel(V ) must include a column of the form COUNT(A0 ). Let N denote the column in (V ) corresponding to that COUNT(A0 ) column. Let QV Groups = (Groups(V )) \ Groups(Q), i.e., QV Groups 13
contains the grouping columns in Q that are also grouped in V . Condition C20 entails that columns in QV Groups must also occur in (V ). Construct a new view V a as follows: V a: SELECT QV Groups, SUM(N ) FROM (V ) GROUPBY QV Groups
Cnt V a) to the tables in Tables(Q), where Cnt V a is a new column name, and G is Add V a(G; a list of n new column names, where n is the number of columns in QV Groups. Add Cnt V a to Groups(Q) (but not to ColSel(Q)), and add n equality predicates equating corresponding columns in QV Groups and G to the WHERE clause of Q. Finally, replace AGG(A) in Sel(Q) by Cnt V a AGG(A). Intuitively, Cnt V a needs to be computed when Groups(Q) does not contain all the columns in (Groups(V )), to \sum" the multiplicities of the dierent groups in V that are merged in Q. (If Groups(Q) contains all the columns in (Groups(V )), then using V a as above is still correct, though it can be simpli ed.) Cnt V a) in Tables(Q) and Cnt V a in Groups(Q) does not Observe that introducing V a(G; aect the multiplicities of the (intermediate) multisets constructed, since G is a super-key of view V a, and the newly added predicates in the WHERE clause of the rewritten query Q0 ensure that V is joined with V a on this super-key. The following theorem states that conditions C1; C20 { C40 are sucient conditions for usability of V in evaluating Q:
Theorem 4.1 Let Q be a single block aggregation query and let V be a single block aggregation view, such that Conds(Q) and Conds(V ) contain only equality and inequality predicates of the form A op B and A op b, where A and B are column names, b is a constant, and op is one of fg. If conditions C1; C20 { C40 are satis ed, view V is usable in evaluating Q. In that case Q0, obtained by applying steps S10 { S50 , is a rewriting of Q using V . 2
Example 4.3 Consider again the query Q and view V1 from Example 4.1. View V1 can be used to evaluate Q since conditions C1; C20 { C40 are satis ed. Condition C1 Condition C20 Condition C30 Condition C40
: The mapping from V1 to Q is fA2 ! A1 ; B2 ! B1 ; C2 ! C1; D2 ! D1 g. : For column A1 in Groups(Q), BA1 is the column A2 in ColSel(V1 ). : Conds0 is C1 = F1 , since the condition B1 = D1 is enforced in V1 . : For column COUNT(B1 ) in Sel(Q), there is a column COUNT(D2 ) in Sel(V ).
The rewritten query Q0 resulting from applying the above steps is given in Example 4.1. 2
Example 4.4 (Constraining (AggSel(V )))
Let R1 (A; B; C; D) and R2(E; F) be two tables. Consider the query Q below:
Q: SELECT A1 ; E1 , SUM(B1 ) FROM R1 (A1 ; B1 ; C1 ; D1 ); R2 (E1 ; F1 ) WHERE B1 = F1 GROUPBY A1 ; E1
14
Let the view V be: V : SELECT A2 ; E2 ; F2 , SUM(B2 ) FROM R1 (A2 ; B2 ; C2 ; D2 ); R2 (E2 ; F2 ) GROUPBY A2 ; E2 ; F2
View V cannot be used to evaluate Q above, although in the absence of the WHERE clause in Q, V could be used to evaluate Q. Intuitively, this is because the built-in predicates in the query constrain the possible values of B1 , and B2 is aggregated upon in the view V ; no condition on the result of the SUM in V can capture the eect of the condition on B1 in Q. 2
4.3 Aggregation Queries and Views With HAVING Clauses In Section 3.3, we described the usability conditions and rewriting algorithm for an aggregation query with a HAVING clause and a conjunctive view. Here, we brie y describe the extensions required for aggregation queries and views with HAVING clauses. Essentially, the additional subtleties that must be considered involve the relationships between the GROUPBY and HAVING clauses in the view V and the query Q. Intuitively, the HAVING clause in V may eliminate certain groups in V (i.e., those that do not satisfy GConds(V )). If any of these eliminated groups in V is \needed" to compute an aggregate function over a group in Q, by coalescing multiple groups in V , then view V cannot be used to evaluate query Q. Hence, condition C30 must be extended to test whether there exists GConds0 such that GConds(Q) is equivalent to the combination of GConds(V ) and GConds0, taking the grouping columns Groups(V ) and Groups(Q) into account. Before checking any of the conditions for usability, the query Q and view V are independently pre-processed to \move" maximal sets of conditions from the HAVING clause to the WHERE clause, as discussed in Section 3.3; the resulting normal form allows independent comparison of Conds(Q) and Conds(V ), on the one hand, and of GConds(Q) and GConds(V ), on the other. The rewriting algorithm takes these additional re nements of the conditions of usability into account. Speci cally, step S30 determines a GConds0 in addition to Conds0, using GConds(V ) and GConds(Q) (resulting from the pre-processing step). Steps S40 and S50 are augmented to compute aggregation columns appearing in GConds(Q), in addition to those appearing in Sel(Q).
4.4 Incorporating AVG The aggregate functions SUM, COUNT and AVG are related in that, given values for two of them over some column, the third can be computed. The usability conditions and the rewriting algorithms above can be generalized to allow Sel(Q); GConds(Q); Sel(V ) and GConds(V ) to have AVG(A) as a column, by taking this relationship into account.
4.5 Conjunctive Queries and Aggregation Views Consider the case when the query Q is a conjunctive query (i.e., no grouping and aggregation), but the view V has grouping and aggregation. In this case, view V cannot be used to evaluate Q if the multiset semantics is desired, because the GROUPBY clause in the view results in losing information about the multiplicities of tuples. Consider the following illustrative example: 15
Example 4.5 Let R1(A; B; C) be a table. Consider the query Q below: Q: SELECT A1 ; B1 FROM R1 (A1 ; B1 ; C1 )
Let the view V1 be: V1 : SELECT A2 ; B2 , COUNT(C2 ) FROM R1 (A2 ; B2 ; C2 ) GROUPBY A2 ; B2
There is a 1-1 column mapping from V1 to Q, Sel(V1 ) contains all the columns required in the SELECT clause of Q, and there are no constraints enforced by either of the WHERE clauses. Even though COUNT(C2 ) has the required multiplicity information, this information cannot be used, and there is no rewriting of Q that uses view V1 . 3 2
5 Sets and Keys In general, any table in a database and the result of any query can be a multiset. In practice, however, many database tables have keys, and are thus guaranteed to be sets with no duplicates in them. In this section we rst explore how it can be determined, using such meta-data about the database schema, that the result of a query or a view is a set, independent of the contents of database tables. We then discuss how such information aects the usability of a view in answering a query.
5.1 Using Keys A common way in which to determine that a table must be a set is if the table has a key. We brie y summarize some relevant properties of keys of tables. There is at most one tuple in a table with a given key value. The Cartesian product of two tables has as key the concatenation of keys of the two tables. In general, no smaller set of columns suces as a key. However, in the case of a foreign key join (i.e., a key of table R2 also appears in table R1 and is used as a column for the join of R1 and R2), the key of the leading table R1 suces as a key for the join result. (Given any tuple in R1, it can join only with one tuple in R2.) Knowledge about functional dependencies can also be used to infer keys; in particular, if column A functionally determines column B, and B is a key, then so is A. Conceptually, it is useful to divide the evaluation of a query into two phases. In the rst phase, the FROM and WHERE clauses are evaluated to create a single intermediate conceptual table, which we call the core table of the given query. The SELECT, GROUPBY and HAVING clauses then apply to this core table. The following propositions summarize when the result of a query is a set, rather than a multiset.
Proposition 5.1 The result of a conjunctive query Q is a set if and only if the core table of Q is a set and the columns retained in the SELECT clause include a key of the core table. 2 3 However, if we assume the existence of an interpreted table ( ) which contains one copy of each of the natural numbers, 0 1 , then it is possible to write the desired SQL query. Alternatively, [GHQ95] have suggested an \expand" operator to replicate tuples in a table an appropriate number of times. N at N
;
;:::
16
Proposition 5.2 The core table of a query is a set if and only if each of the tables speci ed in the FROM clause of the query is a set. 2
5.2 Exploiting Set Semantics The techniques outlined in the previous section allow us to determine that the result of a query Q and a view V are guaranteed to be sets. Similarly, if both the query and the view use SELECT DISTINCT, then their results are sets, by de nition. In these cases, we can simplify the conditions for the view to be usable in evaluating the query, as well as the steps required to rewrite the query; [LMSS95] gave similar conditions for conjunctive queries and views under the set semantics. Consider the case of conjunctive queries and views (i.e., there is no grouping and aggregation in the query or the view), where the query and the view results are known to be sets. Condition C1 can be relaxed to allow many-to-1 column mappings instead of 1-1 column mappings, while conditions C2 and C3 are still required. (Condition C4 dealt with aggregates, and is not applicable here.) The rewritten query is obtained by applying steps S1 { S3 , with minor modi cations to handle repeated column names in the FROM clause because of the many-to-1 mappings. Consider the following illustrative example:
Example 5.1 Let R1(A; B; C) be a table, where column A is a key of R1. Consider the query Q
below:
Q: SELECT A1 FROM R1 (A1 ; B1 ; C1 ) WHERE B1 = C1
Let the view V1 be: V1 : SELECT A2 ; A3 FROM R1 (A2 ; B2 ; C2 ); R1 (A3 ; B3 ; C3 ) WHERE B2 = C3
Since the query Q and the view V1 return sets (because column A is a key of table R1), can be a many-to-1 mapping; in particular the mapping : fA2 ! A1 ; A3 ! A1 ; B2 ! B1 ; B3 ! B1 ; C2 ! C1; C3 ! C1g can be used. Conditions C2 and C3 are also satis ed. The rewritten query Q0 that uses view V1 is: Q : SELECT A1 FROM V1 (A1 ; A4 ) WHERE A1 = A4 0
Observe that, in the absence of the information about keys, Q0 is not a valid rewriting of Q, and furthermore V is not usable in evaluating Q. 2
6 Related Work There has been previous work on using views to answer queries (e.g., [CKPS95, YL87, SJGP90, TSI94, CR94]), but the formal aspects of this problem, particularly for SQL, have received less attention. 17
Caching of previous query results was explored in [Sel88, SJGP90] as a means of supporting stored procedures. This corresponds to using materialized views when they match syntactically a sub-expression of the query. In the ADMS optimizer [CR94], subquery expressions corresponding to nodes in the query execution (operator) tree were also cached. A cached result is matched against a new query by using common expression analysis [Fin82]. Semantic matching between a cached view and a (sub)query via mappings, and grouping and aggregation issues, were not addressed. View usability has been studied for conjunctive queries with set semantics and without grouping and aggregation in, e.g., [YL87, LMSS95]. Levy et al. [LMSS95] showed a close connection between the problem of usability of a view in evaluating a query and the problem of query containment. However, this connection does not carry over to the multiset case. [LMSS95] also presented a simple technique for generating a rewriting of a query Q using view V , under the set semantics. Essentially, the technique consists of rst conjoining V to the FROM clause of Q, and then (independently) minimizing the resulting query to eliminate redundant tables. In the case of SQL queries, however, the multiset semantics requires that the elimination of tables in Q made redundant by adding V to the FROM clause be performed as an intrinsic part of determining usability of V in evaluating Q. Optimization of conjunctive SQL queries using conjunctive views has been studied in [CKPS95]. In addition to considering when such views are usable in evaluating a query, they consider how to perform this optimization in a cost-based fashion. However, they did not consider the presence of keys in tables, or grouping and aggregation. A related problem to ours is studied in Gupta et al. [GMR95]. They assume that a materialized view may be rede ned, and investigate how to adapt the materialization of the view to re ect the rede nition. This problem is clearly a special case of the one we study, with the additional assumptions that the system knows the type of modi cation that took place, that the new view de nition is \close" to the old de nition, and that the view materialization may be modi ed. When all of these conditions hold our algorithms will recognize the possibility of using the old view de nition in evaluating the new one; however, the algorithms in [GMR95] may materialize the new view de nition more eciently, since they update the materialized view rather than recomputing it. In general, though, there may be better ways to compute the new view de nition by using other materialized views, and our algorithms will nd those ways as well. Although our algorithms may create a larger search space for the optimizer, we believe this is not a practical concern for the type of queries discussed in [GMR95]. Yan and Larson [YL94] and Chaudhuri and Shim [CS94] considered the problem of pushing grouping and aggregation through joins. The approach of [CS94] generalizes the approach of [YL94], creating a larger search space for the query optimizer. The techniques of [CS94] identify subtrees in the operator tree representing the query Q where grouping and aggregation can be applied to all or part of the required columns. The transformations they consider are for a more restricted case than the one we study, since they consider the grouping and aggregation relationship between a query and a portion of that query, while we consider unrelated queries and views. Hence, the conditions under which their transformations are possible are special cases of our conditions of view usability for queries and views with grouping and aggregation. Recently, Gupta et al. [GHQ95] independently considered the problem of using materialized aggregation views to answer aggregation queries, by comparing the query tree representations of the view and the query. Their approach uses transformation rules on the query tree representations, does not provide any formal guarantees of completeness, and does not deal with many of the cases covered by our approach. Speci cally, their algorithm does not take the conditions in the WHERE and HAVING clauses into account when comparing Sel(Q) with Sel(V ) and Groups(Q) with Groups(V ) (see, e.g., conditions C20 and C40 ). For example, their techniques would not determine the usability 18
of view V1 in evaluating query Q in Example 1.1. Such cases arise frequently as a result of equi-joins between tables. Finally, a large body of work has been devoted to the view maintenance problem, i.e., keeping materialized views up-to-date when the underlying tables are updated (e.g., [BLT86, CW91, GMS93]). This issue is orthogonal to the usage of these views in evaluating a query, which is the topic of this paper.
7 Conclusions The exploitation of materialized views is likely to be an important technique for performance enhancement, particularly for applications where access to the base data is more expensive than access to the view. In this paper we presented general techniques to rewrite a given SQL query so that it uses materialized views, if possible. We have focused on single-block SQL queries and views. Often, multi-block SQL queries (e.g., queries with view tables in the FROM clause) can be transformed to single-block queries, e.g., using techniques described in [YL94, CS94, GHQ95]. In such cases, our techniques can also be applied. We are currently extending our work in several ways, including considering the view usage problem for arbitrary nested queries, integrating our techniques with algebraic cost-based optimizers along the lines described in [CKPS95], and developing strategies for determining which views to cache.
References [BI94] [BLT86] [CKPS95] [CR94] [CS94] [CV93] [CW91] [Dat94] [Fin82]
Daniel Barbara and Tomasz Imielinski. Sleepers and workaholics: Caching strategies in mobile environments. In Proceedings of the ACM SIGMOD Conference on Management of Data, pages 1{12, 1994. J. A. Blakeley, P-A. Larson, and F. W. Tompa. Eciently updating materialized views. In Proceedings of the ACM SIGMOD Conference on Management of Data, pages 61{71, Washington D.C., May 1986. Surajit Chaudhuri, Ravi Krishnamurthy, Spyros Potamianos, and Kyuseok Shim. Optimizing queries with materialized views. In Proceedings of International Conference on Data Engineering, 1995. C. M. Chen and N. Roussopoulos. The implementation and performance evaluation of the ADMS query optimizer: Integrating query result caching and matching. In Proceedings of the International Conference on Extending Database Technology, 1994. Surajit Chaudhuri and Kyuseok Shim. Including group-by in query optimization. In Proceedings of the International Conference on Very Large Databases, 1994. Surajit Chaudhuri and Moshe Y. Vardi. Optimization of real conjunctive queries. In Proceedings of the ACM Symposium on Principles of Database Systems, 1993. Stefano Ceri and Jennifer Widom. Production rules for incremental view maintenance. In Proceedings of the International Conference on Very Large Databases, Barcelona, Spain, 1991. Anindya Datta. Research issues in databases for ARCS: Active rapidly changing data systems. SIGMOD Record, 23(3):8{13, September 1994. S. Finkelstein. Common expression analysis in database applications. In Proceedings of the ACM SIGMOD Conference on Management of Data, 1982.
19
[GHQ95]
Ashish Gupta, Venky Harinarayan, and Dallan Quass. Generalized projections: A powerful approach to aggregation. In Proceedings of the International Conference on Very Large Databases, 1995. [GJM96] Ashish Gupta, H. V. Jagadish, and Inderpal S. Mumick. Data warehousing using selfmaintainable views. In Proceedings of EDBT 96 (to appear), 1996. [GMR95] Ashish Gupta, Inderpal S. Mumick, and Kenneth Ross. Adapting materialized views after rede nitions. In Proceedings of the ACM SIGMOD Conference on Management of Data, San Jose, CA, May 1995. [GMS93] Ashish Gupta, Inderpal S. Mumick, and V. S. Subrahmanian. Maintaining views incrementally. In Proceedings of the ACM SIGMOD Conference on Management of Data, pages 157{166, 1993. [HSW94] Yixiu Huang, Prasad Sistla, and Ouri Wolfson. Data replication for mobile computers. In Proceedings of the ACM SIGMOD Conference on Management of Data, pages 13{24, 1994. [JMS95] H. V. Jagadish, Inderpal S. Mumick, and Abraham Silberschatz. View maintenance issues for the Chronicle data model. In Proceedings of the ACM Symposium on Principles of Database Systems, San Jose, CA, May 1995. [LMS94] Alon Y. Levy, Inderpal S. Mumick, and Yehoshua Sagiv. Query optimization by predicate move-around. In Proceedings of the International Conference on Very Large Databases, Santiago, Chile, September 1994. [LMS96] Alon Y. Levy, Inderpal S. Mumick, and Yehoshua Sagiv. Reasoning with aggregation constraints. In Proceedings of EDBT 96 (to appear), 1996. [LMSS95] Alon Y. Levy, Alberto O. Mendelzon, Yehoshua Sagiv, and Divesh Srivastava. Answering queries using views. In Proceedings of the ACM Symposium on Principles of Database Systems, San Jose, CA, May 1995. [LSK95] Alon Y. Levy, Divesh Srivastava, and Thomas Kirk. Data model and query evaluation in global information systems. Journal of Intelligent Information Systems, 5:121{143, 1995. Special Issue on Networked Information Discovery and Retrieval. [RSSS95] Kenneth A. Ross, Divesh Srivastava, Peter Stuckey, and S. Sudarshan. Foundations of aggregation constraints. An early version appeared in Proceedings of the Second International Workshop on Principles and Practice of Constraint Programming, 1994, LNCS 874., 1995. [Sel88] Timos Sellis. Intelligent caching and indexing techniques for relational database systems. Information Systems, pages 175{185, 1988. [SJGP90] M. Stonebraker, A. Jhingran, J. Goh, and S. Potamianos. On rules, procedures, caching and views in database systems. In Proceedings of the ACM SIGMOD Conference on Management of Data, 1990. [TSI94] Odysseas G. Tsatalos, Marvin H. Solomon, and Yannis E. Ioannidis. The GMAP: A versatile tool for physical data independence. In Proceedings of the International Conference on Very Large Databases, pages 367{378, Santiago, Chile, September 1994. [YL87] H. Z. Yang and P.-A. Larson. Query transformation for PSJ-queries. In Proceedings of the International Conference on Very Large Databases, pages 245{254, 1987. [YL94] W. P. Yan and P-A. Larson. Performing group-by before join. In Proceedings of International Conference on Data Engineering, pages 89{100, 1994. [ZGMHW95] Yue Zhuge, Hector Garcia-Molina, Joachim Hammer, and Jennifer Widom. View maintenance in a warehousing environment. In Proceedings of the ACM SIGMOD Conference on Management of Data, San Jose, CA, May 1995.
20