On the Optimal Nesting Order for Computing N-Relational Joins

On the Optimal Nesting Order for Computing N-Relational Joins TOSHIHIDE IBARAKI Toyohashi University and TIKO KAMEDA Simon Fraser University

Using the nested loops method, this paper addresses the problem of minimizing the number of page fetches necessary to evaluate a given query to a relational database. We first propose a data structure whereby the number of page fetches required for query evaluation is substantially reduced and then derive a formula for the expected number of page fetches. An optimal solution to our problem is the nesting order of relations in the evaluation program, which minimizes the number of page fetches. Since the minimization of the formula is NP-hard, as shown in the Appendix, we propose a heuristic algorithm which produces a good suboptimal solution in polynomial time. For the special case where the input query is a “tree query,” we present an efficient algorithm for finding an optimal nesting order. Categories and Subject Descriptors: H.2 [Information Systems] Database Management; F.2 [Theory of Computation] Analysis of Algorithms and Problem Complexity General Terms: Algorithms Additional Key Words and Phrases: Query processing, relational database, join

1. INTRODUCTION The relational database model was proposed by Codd in 1970 [9], and because of its solid theoretical foundation has since been widely accepted by researchers in database systems. Other features of the relational model include data independence and a high level of relational operations, which makes it easy for nonprogrammers to use [lo]. More recently, real database management systems based on the relational model have been constructed [4, 201, and a number of such systems are now commercially available [ 121. In spite of its superiority in many This work was supported in part by the Natural Sciences and Engineering Research Council of Canada under grants A4315, A5240, and T1333 and by the Ministry of Education, Science, and Culture of Japan under Scientific Research Grant-in-Aid. Authors’ addresses: T. Ibaraki, Dept. of Information and Computer Sciences, Toyohashi University, Tempaku-Cho, Toyohashi, Japan 440; T. Kameda, Dept. of Computing Science, Simon Fraser University, Burnaby, B.C., Canada V5A lS6. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission. 0 1984 ACM 0730-0301/84/0900-0482 $00.75 ACM Transactions

on

DatabaseSystems,Vol. 9, No. 3, September

1984,

Pages482-502.

Optimal Nesting for Computing N-Relational Joins

l

483

aspects over other models such as hierarchical and network models, it is yet to be proven that large relational databases which are comparable in speed to other kinds of database systems can be built. Since the queries in a relational database system are expressed in a high-level nonprocedual language, it is the responsibility of the system to translate them into efficient programs which evaluate them. The efficiency problem of query processing has been recognized as one of the most difficult hurdles to cross before large relational databases can become practical. Suppose, for example, that we want to compute the join of four relations, RI, R2, Rs, and Rq, using the “nestediteration method” of [X3]. Let the number of disk pages used to store these relations by M1 = 16, M = 32, M3 = 64, and M4 = 128, respectively. The number of page fetches required by the “nested-iteration method” is M1 + MlM2 + Ml M2 MS + Ml M2 MSM4 [ 141,which is over four million. A large amount of time and effort has been spent on the problem of query processing in relational database systems, and many solutions have been proposed, analyzed, and/or implemented [2, 3, 4, 5, 7, 13, 14, 18, 19, 20, 22, 251, ranging from a rather simple heuristic to an almost exhaustive search. For all reasonable models of query processing cost, a true optimization of cost appears computationally intractable. However, since normally only a few relations are involved in a typical query (say, one to five, and not more than ten), an exhaustive search may not necessarily be impractical. On the other hand, since searches often rely on crudely estimated parameters, the result of an exhaustive search may not always give a better method of query evaluation than some heuristics. In this paper we restrict ourselves to the nested loops method of query evaluation, and try to minimize the number of page fetches necessary to evaluate a given 9uery.l We further impose the condition that each relation be sorted at most once, assuming that sorting is relatively costly. In most current systems, page accessesare the most time consuming operations in query evaluation [7]; we believe that the nested loops method has several merits (for instance, ease of programming), and in certain situations, the number of page accessesrequired by this method can be kept small [26]. Because we restrict ourselves to the nested loops method, it is easier to formulate the optimization problem than if all possible methods of query evaluation, including the merge-scan method [7], were considered. However, we prove that our problem is still NP-complete, indicating its intractability when a large number of relations are involved in a query. We therefore propose a heuristic algorithm for finding an optimal nesting order. Our main contributions in this paper are the following: (1) Introduction of a new structure into the nested loops method. (2) Derivation of a formula for the expected number of page accesses. (3) Proof that the minimization of the expected number of page accessesis NPcomplete. (4) A heuristic algorithm for finding an optimal nesting order. (5) A polynomial time algorithm to determine an optimal nesting order for the special case in which the given query is a tree query and each predicate in the query has a relatively small probability of being satisfied. ’ Adachi, et al. [2,3] investigate

a similar problem through a different ACM Transactions

approach.

on Database Systems, Vol. 9, No. 3, September 1984.

484

.

T. lbaraki and T. Kameda

In Section 2, we introduce the notation and assumptions used th:roughout this paper. In Section 3, the structure of a nested loops program used to evaluate queries is described. Section 4 presents a detailed derivation of a formula for the expected number of page fetches required by the program in Section 3. On the basis of the analysis in Section 3, we try in Section 5 to minimize the expected number of page fetches and, since the desired minimization is NP-complete (see the Appendix), we suggest a heuristic algorithm for minimization. Finally, in Section 6, we treat a special case in which a given query is a “tree query” and, for this special case, present a polynomial time minimization algorithm. The appendix contains a proof that the problem of minimizing page fetches, as formulated in this paper, is in general NP-complete, justifying our approach in Section 5. 2. NOTATION AND ASSUMPTIONS

Let RI,. . . , R, be the relations referenced in a given query and, for i = 1, . . . , n, let Ni and Mi be the number of tuples of Ri and the number of pages needed to store all tuples of Ri, respectively. ti denotes the length (in bytes) of each tuple of Ri and s denotes the page size (in bytes). Therefore, assuming that tuples from different relations do not coreside in a page, Mi = ] NitJs 1 . In what follows we use an approximation Mi = Niti/s for simplicity. In this paper, attribute a of relation Ri is denoted by Ri.a, and the value of an attribute a of a particular tuple 7’ E Ri is denoted by T.u. We assume that each query Q is a conjunction of “simple” predicates, that is, Q = PI A P2 A . . . A P,,,. Each conjunct Pj is of the form Ri.aBU or Ri.aBRj.b, where u is a value and 0 E (=, I, 2, ]. The so-called single-variable predicate [22] of the form RiaOU or R,.aORi.b can be processed as part of preprocessing before the join predicates of the form Ri.uBRj.b(i # j) are considered. This preprocessing can be facilitated if relations are sorted on some attributes and/or if indices already exist for them. If no index exists, each relation must be sequentially scanned once to eliminate those tuples which do not satisfy the single-variable predicates. In either case, the attributes that are not needed in subsequent operations should also be eliminated at this time. Experience has shown that this preprocessing is usually beneficial [20,21]. Thus, in the rest of this paper, we consider only join predicates, and define Ni, Mi, and ti to be the sizes after preprocessing. Now let P = ‘Ri.aBRj.b’ be a join predicate. The selectivity factor fp, with respect to this predicate, is defined to be the expected fraction of tuple pairs from Ri and Rj satisfying P. Therefore, out of the total of NiNj tuple pairs of Ri and Rj, fpNiNj tuple pairs satisfy P, on the average. Thus fp can be thought of as the probability that a randomly chosen tuple pair from Ri and Rj satisfies P. It follows from this definition that for a given tuple of Ri (Rj) there are on the average fpNj( fpNi) tuples of Rj(Ri) that satisfy P. For a given query Q = PI A Pz A . . . A P,,,, we assume that the predicates (Pj) are independent in the sense that none of them is implied by any others. The query graph, QG(Q), for query Q, has the node set V = (RI, . . . , R,), consisting of all relations referenced by Q [5]. For each predicate Pj = ‘Rk.aBRi.b’ of Q is an edge between Rk and Ri labeled by Pj. Thus a query graph is in general a labeled multigraph. ACM Transactions on Database Systems, Vol. 9, No. 3, September 1984.


RI,

485

R2, R3: relations

P1 = ‘R1.aelR2.b’; Fig. 1.

l

A query

P2 = ‘R2.002R3.d’

graph representing

an example query.

One of our important assumptions is that the values of attributes are uniformly distributed, independently of each other.’ Consider, as an example of a simple query, Q = PI A Pz, represented by the graph in Figure 1, where PI = ‘Rl.aOR2.b’ and Pz = ‘R2.cOR3.d’.Our assumption of independent distributions implies that the expected number of tuple combinations (Ti, T2, T3) with 7’i E RI, T2 E Rz, and T3 E Rx such that T1 and T2 (T2 and T3) satisfy P1(P2) is given by

fp,fQGNzN3. We now briefly examine the effect of sorting a relation before performing a join by using the nested loops method. Consider again the example of Figure 1. If Ra is sorted on attribute d, then about fp2N3 consecutive tuples of R3 satisfy P3 with a given tuple of R2. These tuples normally occupy ] fpzN3 t3/s 1 = [ fp2M3 1 consecutive pages, but since they may not start at a page boundary, they may occupy as many as [ fp2M3 1 + 1 pages. We approximate the number by fpzM3 for simplicity. Suppose that we scan RI, Rz, and R3, using nested loops with RI (R3) scanned in the outermost (innermost) loop. For i = 1, 2, and 3 let Xi be the pointer to a page of Ri used in the nested scan of these relations and let [Xi] denote the page of R, pointed to by Xi. Suppose the pages [X,] and [X,] are in main memory and a particular pair ( Tl, T2) with Tl E [Xi], T2 E [X2] is being examined. Since the tuples T3 E Rs that satisfy Pz with Tz are localized on fpzM3 consecutive pages, we only need to fetch fp2M3 pages, which is smaller than the Ma pages required if RBwere not sorted on d. This type of saving usually more than compensates for the effort needed to sort a relation Ri (e.g., 2 ] MilogzMi 1 fetches if the zway merge sort is used [7]). Hence it is assumed in subsequent discussions thr’ each relation, except the one scanned in the outermost loop, is sorted before ’ . I execution of the nested loops method. 3. NESTING STRUCTURE

As stated in the Introduction, we use a nested loops program to scan the relations referenced in a query, assuming that each relation (except for the one scanned in the outermost loop) is sorted on an appropriate attribute. Consider the example * Strictly speaking, this is impossible in a relational database since the same tuple cannot appear more than once in a relation. The assumption of independence is only an approximation, often adopted for convenience [4, 191. ACM Transactions on Database Systems, Vol. 9, No. 3, September 1984.

486

l


Pp =‘R, .ae2R2.b’

;

P3= ‘RI .ce3Fig.d’

P4 =‘A2.ee4R4,f’ ; P5= ‘R2.gt35R5.h’ Fig. 2. Structure of a nested loops programs. (Ri is sorted with respect to attribute is the second operand of Pi.)

x such that I&.x

shown in Figure 2, where the five relations are nested in the order RI (outermost) and RZ, . . . . R5 (innermost). Figure 2 is actually a query graph in which the relations are arranged in a nesting order. For each i there is exactly one solid edge labeled by Pi connecting Ri to a relation Rk with k < i, where Pi = ‘Rk.xfliRiy ‘. If a solid edge is labeled by Pi = ‘Rk.XBiRi.y’, we assume that Ri is sorted on attribute y and each tuple Tl E Rk contains two pointers associated with Pi. One points to the first page of Ri on which a tuple T2 E Ri such that ‘T1.XBiTz.y’ is TRUE is stored, and the other points to the last such page. Since Bi E (=, I, L, ), these pages are consecutive. As shown below, these pointers are used to fetch only those pages of Ri which contain tuples that satisfy Pi with some tuple(s) on the page of Rk currently being examined (in main memory). Remark. To construct the pointers mentioned in the paragraph above, we need a list showing for each page of Ri the distinct values of Ri.y which appear on it. We assume that this list is also constructed while sorting Ri and can be contained in main memory until the pointers for Rk with respect to Pi = ‘RkX0Ri.y are constructed, after which it can be destroyed. With this list available, the pointers from Rk to Ri with respect to Pi can be constructed while sorting Rk with no additional page fetches. In some cases, especially when the range of the attribute Ri.y is small, some space saving can be achieved by not storing these pointers with each tuple of Rk. Instead, a separate index table for Rk should be created, and kept in main memory during the entire processing. Consider the nested loops program in Figure 3, where the nesting order of RI, R2, . , . , R, is used without loss of generality. For i = 1, 2, . . . , n, the program makes use of a pointer Xi to a page of Ri, where 1 I Xi 5 Mie All combinations of x,, x,, . . . . X, are considered using the usual nested loops with X,(X,) scanning in the outermost (innermost) loop. As before, let [Xi] denote the page of Ri pointed to by Xi. For i = 2, 3, . . . , n, when Xi is advanced, [Xi] is not automatically fetched, but a test is performed to see if it is necessary to do so. Let pred (i) be the set of predicates of Q which reference only relations Rk with ACM Transactions on Database Systems, Vol. 9, No. 3, September 1984.


9

487

for X, := 1 until MI do begin for X2 := until M2 do begin

for X. := 1 until begin Fig. 3.

Structure

M, do

of a nested loops program.

k I i, that is, pred(i) = (P 1P = ‘Rk.aBRh.b’and k, h I i]. Note that pred(1) = 0. In the example of Figure 2, for instance, we have pred(4) = {P,, Pa, Ps, P6). We also define a set of tuple combinations that satisfy the predicates in pred(i) as follows: for X = (X1, . . . , X,,) and i = 1, 2, . . . , n, tcx(i) = ((Tl, *a*, !Z’i)] for k = 1, 2, . . . , i, Tk E [Xk] and (Tl, . . . , Ti) satisfies all predicates in pred(i)]. Note that tcx(l) = the set of tuples on page [X,]. When the TEST-FETCH-OUTPUT part of the program in Figure 3 is executed, X1, X2, . . . , X,, point to a particular combination of pages. This subprogram is expanded in Figure 4, and scans the tuples of these pages, again in the nested loops fashion. The outermost loop is shown in Figure 4(a), while an intermediate loop and the innermost loops are shown in Figures 4(b) and 4(c), respectively. For i = 1, 2, . . . , n, let Yi be the pointer to a tuple in [Xi] and let Ti( Yi) be the tuple pointed to by Yi. (We thus have Yi I [ s/ti 1 , i.e., the number of tuples in one page of Rim)The page [Xi] is fetched only if the tuple combination belongs to tcx(i - 1) and Xi lies between the (T,(Yd, T2(Y2), . . . , Ti-1(Yi-l)) two pointers associated with Tk(YJ, where Pi = ‘Rk.dRi.b’. (Recall that one of them points to the first page of Ri on which a tuple Ti E Ri appears such that Tk( Y&a and Tinb satisfy Pi and the other points to the last such page.) In Figure 4(b), we test only the predicates in pred(i) - pred(i - l), since (Tl(YJ, . . . , Ti-I( Yi-1)) has already passed the test for pred(i - 1) in the surrounding loop. Finally, the innermost loop, that is, the lzth loop shown in Figure 4(c), has a provision for outputting the tuple combinations which are in tcx(n) projected on the appropriate attributes. 4. DERIVATION OF A COST FORMULA

In this section we derive a formula for the expected number of page fetches from Ri necessary in the nested loops program given in Section 3. Let Fi

=

II

fP*

PEpred

By the assumption of independent distributions there are, on the average, Fi-1 .N1 . . . Ni-1 combinations of tuples from R1, . . . , Ri-1 which satisfy all predicates in pred(i - l), amounting to Ii-1 = Fi-,Nl

. . . Ni-JMl

*. * Mi-1

(4.1)

ACM Transactions on Database Systems, Vol. 9, No. 3, September 1984.

488


l

TEST-FETCH-OUTPUT end;

end; end; for Y, := until begin

[s/t1 1 do

end; (a)

12:= parent(i);, (i.e., there is an arc from Rk to R;.) if X; lies between the two pointer values of Tk( Yk) associated with Pi then do begin

fetch [Xi] unless it is already in the main memory; for Y; := 1 until [s/t, 1 do begin

if (TAYJ, . . . , 7’i( Y,)) satisfies all predicates in pred(i) - pred(i - 1) then do begin

1 (i + 1)st loop 1 end; end, end; k := parent(n); if X, lies between the two pointer values of Z’k(YJ associated with I’,, then do begin

fetch [X,] unless it is already in the main memory; for Y, := 1 until [s/t. 1 do begin if (T~(yd,. begin

. . , TJY,)}

satisfies all predicates in pred(n) - pred(n - 1) then do

output the projection of ( Z’r(Yr), . . . , T,( Y,)) on the desired set of attributes end; end; end; (cl

Fig. 4. TEST-FETCH-OUTPUT innermost loop.

subprogram. (a) The outermost loop; (b) the ith loop; (c) the

tuple combinations per page combination of RI, . . . , Ri-1. Let Pi = ‘Rk.aBRi.b’ and suppose that these Ii-1 tuple combinations involve Ji-1 distinct tuples (on the average) on each page of Rk. To derive a fomula for Ji-1, recall that one page of Rk has s/tk tuples. Therefore, if we choose Ii-1 tuples from such a page with ACM Transactionson DatabaseSystems,Vol. 9, No. 3, September1984.


l

489

repetitions, the probability that a given tuple is not chosen is (1 - t&I’-‘, and therefore the expected number of distinct tuples chosen from a page of Rk is Ji-1 = (S/tk)[l - (1 - t/JS)‘i-‘]*

(4.2)

Note that these Ji-1 tuples may not have distinct Rk.a values. To find the number of distinct Rk.a values involved, let rk be the number of distinct values that Rk.a can take. Then, because of our uniform distribution assumption, any of the rk values is equally likely. If Ji-1 of them are chosen at random with repetitions,3 we have on the average Ki-1 = rk[l - (1 - l/r#-‘1

(4.3)

distinct Rk.a values. For each of these distinct values, fpiMi pages of Ri are fetched. We finally obtain the expected number of pages of Ri that need to be fetched per combination of pages from RI, . . . , Ri-1 as Lj = Mi[l - (1 - fpi)Ki-l],

(4.4)

since the same page need not be fetched more than once. Thus the total number of page fetches from Ri is given by Hi = (Ml * * * Mj-l)Li =M,

(4.5)

. . . Mi[l - (1 - fpi)Ki-l]

for i = 2, 3, . . . , n. By definition, we have Ki-1 5 Ji-1 5 1i-1. Further, it is clear from Eqs. (4.2), (4.3), and (4.4) that Ji-1 5 s/tk, Ki-1 5 rk and Li % Mi. It can also be shown that Ki-1 5 Ji-1. It follows that the total number of page fetches required in the nested loops method is fI i=l

(4.6)

Hi,

where HI = Ml, and Hi is given by (4.5). Finally, the total number of page fetches required in our scheme is given by Cost = i (Hi + S(Pi)),

(4.7)

i=l

where S(Pi) is the number of page fetches needed for sorting Ri on Ri.b, that is, S(Pi) =

I

0

I.2 [ MilogzMi 1

if i = 1 or Ri is sorted on Riy prior to the evaluation of query Q, otherwise,

assuming that the z-way merge sort [7] is used, as discussed in Section 2.

3 See the previous footnote. Eq. (4.3) is only an approximation

for I&-,. For a more precise treatment,

see 1271. ACM Transactions

on Database Systems, Vol. 9, No. 3, September 1984.

490

l


0.8

.A Y 2 c, t 7

0.6

0

30

20

10 Fig. 5.

Behavior

K-1

40

of 1 - 11 - (~PJ”~-‘].

4.1 Approximations In order to visualize the behavior of the formula in Eq. (4.5), the function [l - (1 - fPipKi-l] is plotted in Figure 5 for various values of fpi. It is seen that if fpiKi-l > 5, then the value of the above function is very close to one and, therefore, by Eq. (4.5), Hi may be approximated by MIMZ, . . . , Mie This means that if fpiKi-1 > 5, the effect of pointers for P;( j = 2, . . . , i), intended to reduce the number of page fetches, is negligible. In most practical cases, however, we probably have fpiKi-l < 1 and [l - (1 - fP,)K’-‘]

On the Optimal Nesting Order for Computing N-Relational Joins

On the Optimal Nesting Order for Computing N-Relational Joins

Suggest Documents

On Optimizing Relational Self-Joins - NUS Computing

Optimal Incremental Algorithms for Top-k Joins

Efficient Skew Handling for Outer Joins in a Cloud Computing ...

Optimal Nesting of Species for Exact Cover of Resources - cs.York

Computing Optimal Rebalance Frequency For Log

Computing Optimal Embeddings for Planar Graphs

Computing Optimal Forces for Generalised ... - Semantic Scholar

A Fast Algorithm for Computing Optimal Rectilinear

Computing Optimal Policies for Partially Observable Decision ...

Computing Optimal Stock Levels for Common ... - CiteSeerX

Earlier Nesting Contributes to Shorter Nesting Seasons for the ...

Research Article Optimal High-Order Methods for

Reduced Order Modeling for Computing ... - TerpConnect - Umd

On the optimal allocation of virtual resources in cloud computing ...

On the optimal allocation of virtual resources in cloud computing ...

Optimal Parallel Algorithms for Computing the Sum, the Prefix-sums ...

optimal estimation on the order of local testability of ...

On the First-order Expressibility of Computing Certain ...

Computing Cartograms with Optimal Complexity

An Iterative Algorithm for Computing the Optimal Exponent of Correct ...

OPTIMAL ORDER CONVERGENCE IMPLIES NUMERICAL

The Optimal Order of Convergence for Stable Evaluation ... - CiteSeerX

Evaluating alternative estimators for optimal order quantities in the

OPTIMAL ORDER CONVERGENCE IMPLIES NUMERICAL