Query Optimisation in Distributed Object-Oriented Database Systems*

6 downloads 92 Views 1MB Size Report
processing and optimisation of typical queries, called chain queries, in distributed object-oriented database systems are investigated in detail. An algorithm with ...
Query Optimisation in Distributed Object-Oriented Database Systems* W. SUNf, W. MENG AND C. YU Department of Electrical Engineering and Computer Science, University of Illinois at Chicago, Chicago, Illinois 60680, USA ^School of Computer Science, Florida International University, University Park, Miami, Florida 33199, USA

In this paper, query processing and optimisation in distributed object-oriented database systems are discussed. The processing and optimisation of typical queries, called chain queries, in distributed object-oriented database systems are investigated in detail. An algorithm with complexity of O(n3 * h,) to minimise the total cost is provided using dynamic programming, where n is the number of classes referenced in the query, h, ^ min (n + 2,h) and h is the number of sites in the network. A wide range of diversified issues are addressed and uniformly integrated into our basic solution to the problem. These issues include sorted states of classes; local processing of selections and projections, allowing multiple intermediate results; arbitrary target class at an arbitrary answer site; replicated data; class hierarchies (which captures the IS-A relationship among objects) ; different sites with different processing speeds, and communication lines between different sites with different transfer speeds. The uniformity of this algorithm under so many diversified situations strongly demonstrates the usefulness and the flexibility of the algorithm. Received November 1990, revised February 1991

1. INTRODUCTION In this section, background information is provided. First, the problem is presented. In Section 1.2, basic cost model (of class traversals) is introduced. The difference of the proposed approach from the previous researches is pointed out in Section 1.3. 1.1 The problem

Object-oriented programming has been introduced into database systems recently. Several prototypes of objectoriented database systems (OODBs) have been and/or are being developed such as Iris, 828 GemStone,1029 Postgres,37-38 ORION,7-2224 6>2,616 Starburst18 and Exodus.11 Many research results indicate that objectoriented data bases can be applied in various large-scale and complicated application domains. 2 " 5192731 ' 344041 Objects are grouped into classes.2325 Objects that belong to a class are also called instances of the class. Class hierarchies can be defined to capture the IS-A relationship among objects in different classes. A higher/lower level class is called a superclass/subclass. Attributes specified for a class are inherited (shared) by all its subclasses, and more specific attributes can be defined for subclasses. The domain of an attribute is a class. If the domain of attribute B is class D, then attribute B can take instance from class D or one of the subclasses of D as its value. It is assumed that each object is assigned a system-wide unique identifier (UID). A class can be a primitive class (such as string and integer classes) or a non-primitive one with a set of attributes defined. The value of an attribute is a domain value if its domain is a primitive class; otherwise it is a reference to (object identifier of) an instance of the domain class. As pointed out by Ullman,39 unique identification of objects is one of the most important characteristics that differentiate an OODB system from a relational database * This research is supported in part by MCC and in part by NSF under IRI-8901789.

98

system. This characteristics will be fully made use of by our approach. Refs 25 and 21 are a good survey paper and reference book, respectively, for object-oriented concepts and databases. In our previous research38 we discussed query optimisation in a centralised OODB system. This paper will discuss query optimisation in distributed OODBs. Examples from ORION are used to illustrate our approach.7 Since ORION, which has exploited many object-oriented characteristics such as class hierarchy, composition hierarchy and unique object identification, is a representative example of OODBs, we believe our result is applicable to many other OODBs.2125 The following is an ORION database schema (class hierarchies are ignored), and an example query Q\ against this database schema:7

Auto Manufacturer weight colour owner »

-City name state—

-•-State name population

select Auto where owner . hometown . state . population > 10000

Figure 1. A sample database schema and a query.

The query using notations from Ref. 46 is to find all automobiles owned by persons who live in cities within states with populations over 10000. Class Person is the domain of attribute owner of class Auto; the domain of attribute population in class State is a primitive class integer, and so on. Auto-ID, Person-ID, City-ID and State ID are unique object identifiers for classes Auto, Person, City and State, respectively. They are automatically generated for objects and maintained by OODBs. If we neglect the output and the selection of the query, and properly rename some attributes, its query graph becomes:7

THE COMPUTER JOURNAL, VOL. 35, NO. 2, 1992

Downloaded from https://academic.oup.com/comjnl/article-abstract/35/2/98/360312 by guest on 02 December 2017

-Person age hometown -

QUERY OPTIMISATION IN OBJECT-ORIENTED DATABASE SYSTEMS Auto Person-ID-

Person -»- Person-ID City-ID -

City -City-ID State-ID -

high communication costs. Therefore, navigational methods will not be considered in this paper.

State •-State-ID

Figure 2. Query graph of a sample query.

1.2 Basic class traversal methods and class hierarchies Attributes owner, hometown and state are complex attributes which take UIDs of class Person (Person-ID), class City (City-ID) and class State (State-ID) as their values, respectively. The attributes Person-ID, City-ID and State-ID may be invisible to the users. If class Y is the domain class for attribute X, we draw an arrow from X to Y. This is an example of a typical query called a chain query. In a chain query there is a common 'chaining' attribute between adjacent classes, but there is no common 'chaining' attribute between non-adjacent classes. We use C, to denote an original class, and C M to represent the simplifier class (of C4) containing only the ID attributes which are chained to its adjacent classes as shown in Figure 3. Thus the general form of a chain query is: l.i

/D,

Cn.n -ID,

C2.2 .ID2,ID3-

Figure 3. Query graph of generalised chain query.

where IDt is the attribute that uniquely identifies objects in class C(, 1 < i < n. In OODBs, the chaining from one class to another comes about naturally through the use of complex attributes as shown above. In this paper we shall restrict ourselves to mostly chain-query optimisation. It can be observed that chaining of classes involved in a query graph corresponds to joining in relational database (referred to as a functional join. 11 ' 46 In ORION it is called 'class traversals'. We will use both traversals and (functional) joins indiscriminately). If class Ci{ is considered as a relation, then attribute IDt of C ( , becomes the 'join' attribute between C,, and C,^ (_u and attribute IDt+1 of C M becomes the 'join' attribute between C,, and C (+1 (+1, 2 ^ / ^ n — 1 . Let the above 'join' be denoted by ® . Further, let

Cu = C1 ® C2 Ctn = C ( ® Ci+

,[t,i+l] Ct[IDt+1],i^n® Cn[IDt], i:? 2

where the two attributes within the brackets are the attributes of the intermediate results, Ctpi+j. Cl( and C( „ are special intermediate results in that they only have a single attribute, since we only need to keep the join attributes needed in subsequent join operations. (Attribute IDX in C M and attribute IDn in C( „ will never be used for subsequent joins, therefore they are abandoned.) Furthermore, whenever a join has been performed the attribute participating, in the join will be abandoned, so that the intermediate results Cu only consist of UID pairs [/£>„ ID1+1]. Obviously, Cln is the answer (if we ignore the target class), which can be obtained from Cx k[IDk+l] ® Ck+l n[IDk+l] for some k, l^k where F2 is independent of whether these two

THE COMPUTER JOURNAL, VOL. 35, NO. 2, 1992

99 7-2

Downloaded from https://academic.oup.com/comjnl/article-abstract/35/2/98/360312 by guest on 02 December 2017

W. SUN, W. MENG AND C. YU

arguments are sorted or not. RNL is likely to be better than FNL if the number of objects in C1+1 k is much smaller than the number of objects in Ciy SM (Sort Merge), namely, SM

CUk[ID{,Wk+1]

= Cu[IDt,ID,+1\

®

Cj+1,k[IDj+1,IDk+1].

This is the standard sort-merge method. If Ci+l k is not sorted on IDj+1, sort it on IDj+l (note that C?+1 k could be sorted on IDj+l without explicitly performing a sorting operation; this happens when Cj+l k is an original class or an intermediate result obtained by using the FNL join method while the first argument of FNL is sorted on its first attribute). Sort CtJ on IDj+l. (This is necessary, since no matter whether C, } is an original or an intermediate class, C(J is not sorted on its second attribute.) Then merge these two sorted classes. Therefore, we have:

who live in cities within states with a population over 10000. That is, if A is a class, then A* is used to represent the class hierarchy rooted at A. In this example, the class hierarchy rooted at Auto, rather than the class Auto only, needs to be traversed. When class hierarchies are involved in FNL, RNL and SM, cost-estimation formulas need to be modified by taking the following into consideration: ordering of objects in a hierarchy, class hierarchy index (index is established on the class hierarchy instead of on a single class.22 Details about how these factors can affect the above cost-estimation formulas can be found in our previous research result.38 Clearly whether class hierarchies are involved in a query or not, only the basic cost-estimation formulas for FNL, RNL and SM can be affected. Thus, after these cost-estimation formulas are obtained, we no longer need to distinguish classes with class hierarchies, therefore, in later discussions we assume classes are simple classes without losing generality. 1.3 Literature review

so that, in any case, the obtained intermediate result Cf k will not be sorted, irrespective of whether Ct t and/or C3+1 k is sorted or not. The cost to apply the SM method is: SM(C*p Cj+1 k) = Sort(C( ]) + Sort(Ci+1 k) + Merge (C^C,^) SM(C*t, Csj+Uk) = Sort (CtJ) + Merge (Cu, Cj+Uk), where Sort (X) is the cost to sort class X and Merge (X, Y) is the cost to merge two sorted classes X and Y. Deriving the estimating costs for Sort and Merge is straightforward and standard. For the SM method, if the second class is sorted the cost will be smaller because sorting on the second class is not needed. For each of the three methods, whether the first class is sorted on IDt or not makes no difference in cost estimation. However, if the first class is sorted on /£>„ FNL will produce an intermediate result in sorted form which may reduce the cost for some subsequent SM join operations; the other two methods produce unsorted intermediate results. Note that none of these methods can produce intermediate results sorted on their second attributes. Some OODBs such as O2lb support clustering of objects in their physical storage. When clustering is used and original classes are involved in the join, the above cost estimations for FNL, RNL and SM may need to be modified. Also, objects of classes may no longer be stored in ascending order of their IDs initially, and the algorithm to be presented in this paper subsumes this situation, i.e. the proposed algorithm is applicable regardless of whether the original classes are sorted or unsorted. Since it can be observed from the above formalism that detailed cost estimations for FNL, RNL and SM have no significant effect in subsequent discussions, clustering will not be discussed any further. Now we proceed to incorporate class hierarchies into the above formulation. We note that operands involved in the above FNL, RNL and SM are either original classes or intermediate results. However, these operands in class traversals can actually involve class hierarchies. For example, if classes Vehicle and Bike are subclasses of Auto, and the class Auto in query Q\ (see Figure 1) is replaced by Auto*,722 then the query is to find all automobile owners (including vehicle and bike owners)

Dynamic programming techniques have been employed by various workers in relational query optimisation.1214' 26,32,33,43,45 H o w e v e r ;

are significant differences

Local

reductions

(selections/projections).

Local

reductions are either not addressed26 or always performed before joins. 143233 However, Ref. 1 demonstrates that performing local reductions before joins may not yield optimal results. Our solution allows local reductions and joins to be performed in any order, that is, local reductions may be performed before joins, during joins, and/or after joins. Moreover, reductions can be carried out by using indices or sequential scanning in this paper. Sorted state. A base relation or an intermediate result is said to be in a sorted state if the tuples of the relation are sorted in ascending order of an attribute to be joined with some other base/intermediate relations in subsequent joins. Sort-merge join will be cheaper if relations are already sorted on the joining attributes. It is important to note that although FNL join may have a higher cost than other join methods in some cases, the sorted intermediate result it yields can reduce the cost of subsequent (sort-merge) joins. This means that local optimisation may not guarantee overall optimality. That is, without taking the sorted state of classes and intermediate results into consideration, overall optimality may not be achieved. In most previous papers, the issue of sorted state was not addressed. Selinger and Adiba33 only briefly mentioned a distinction between a sorted relation and an unsorted one. Our solution takes sorted states of original classes as well as intermediate results into full consideration. Bushy plan. In Refs 14, 32 and 33 it is required that at least one of the relations involved in a join be a base relation. In our formalism (also known as the 'bushy plan'), joins between two intermediate results are allowed. It is known that the strategy which only allows at most one intermediate result during a query processing may not necessarily yield an optimal result. (2) In Refs 12, 43 and 45, only data communication costs involving semi-joins are considered. However,

100 THE COMPUTER JOURNAL, VOL. 35, NO. 2, 1992 Downloaded from https://academic.oup.com/comjnl/article-abstract/35/2/98/360312 by guest on 02 December 2017

t h e re

between the solution we propose and earlier ones in addition to the difference in the underlying data models. (1) The cost model in our formalism is much more realistic than those given by other researchers.

QUERY OPTIMISATION IN OBJECT-ORIENTED DATABASE SYSTEMS various researchers have demonstrated that in a distributed database system communication cost may not necessarily be a dominating factor. We take both local processing costs and communication costs into consideration. Furthermore, our approach allows different sites to have different processing speeds, and communication lines between different sites to have different communication speeds. (3) Our algorithm is able to uniformly handle many different situations, including arbitrary target class at an arbitrary answer site, different sites with different processing speeds, replicated copies of classes at different sites, different communication speeds between different sites and class hierarchies. The uniformity of this algorithm under so many diversified situations strongly demonstrates the usefulness and the flexibility of the algorithm. A single, yet uniform, algorithm capable of handling such a wide range of situations efficiently has not been reported before. (4) The algorithm is shown to require 0{n3*h^ complexity in minimising the total cost, where n is the number of classes specified in the query, hy «S min(« + 2, h) and h is the number of sites in the distributed environment., And finally, with slight modification, this algorithm is applicable to query optimisation in relational database system. This rest of the paper is organised as follows: our previous solution for centralised environments is briefly reviewed in Section 2. Minimising the total cost in distributed environment is discussed in detail in Section 3, where an optimal algorithm with complexity O(n3 * hj is provided, where hx < min (n + 2, h) and h is the number of sites; extensions to various situations in a distributed environment are also discussed in this section.

{Cost,, m + Costm+1J +jc(Ct m, Cm+1J, (C , C°m+1J}. (4)

2. REVIEW O F A S O L U T I O N F O R CENTRALISED ENVIRONMENTS We first briefly review a solution we obtained for a centralised environment.38 Let jc(Ctp Cj+lk) be the minimum cost to functionally join (join, for short) Ctj and Cj+lk, where if / = s the second class is sorted (on its first attribute). As discussed above, a superscript for the first class Ctj is not needed, since no matter whether Cu is sorted on ID, or not, it makes no difference for the join cost. Thus, jc(Ct

p

q + 1 k) = min{FNL(CU, Cj+Uk), RNL(Cu,C1+1,k), C q ) } = mm{FNL{Cij, Cj+Uk), RNL(Cu,Cl+ltk), SM(CtJ,Cj+Uk)}.

(i)

Similarly, let jcs{Ct]p Cj+1 k) be the minimum cost to join Ofj and O1+1 k such that the intermediate result C, k is sorted. Then, C,+1,k) = min{jc(Cu, Ci+lk) + Sort (C, t ), Sort (C,,)

i, ,

F(CiC)} j+1 lc

where intermediate result Cik is first obtained in unsorted form and then sorted explicitly; the second expression is for explicitly sorting Cu first and then applying FNL to obtain C\ k. Formula (3) can be understood similarly. Before we present an algorithm for a centralised environment, the following are assumed: (1) no local reduction; (2) no target class is considered. The relaxation of the two assumptions in a centralised environment can be found in Ref. 38. Our solution will find all minimum costs for computing C, } and C'(j for all pairs of /' and j such that (j— 0 = k, \ k i ^j ^n, 0 ^k ^ n — \. Initially, k = 0. In each iteration, k is incremented by 1 until k = n — 1. Let Cost, j and Cost;, be the minimum costs for computing C, } and C\ p respectively. This is a bottom-up strategy, as shown in Figure 4. The /rth row indicates the costs for computing C, ,- for j — i = k and computations are done in the order of k = 0,...,«— 1. There is also a Cost^ matrix, which is similar to the Cost, t matrix except that the latter computes costs Cu. The Cst} matrix will be constructed synchronously with the Cost,, matrix, i.e. as soon as a row of Cost,., is computed, the corresponding row of Cost?j is computed. This is repeated until CliF1 is computed. CUj is formed by computing C, m ® C^+lj for some m, i ^ m 4 can be constructed with me {1,2,3} using Equation (4). • 3. A SOLUTION FOR DISTRIBUTED ENVIRONMENTS In this section we investigate chain query optimisation in a distributed environment. In Section 3.1 we present an algorithm that computes Cln with the minimum total cost under the similar assumptions used in Section 2, that is, no local reduction and no target class are considered. In Section 3.2, processing of queries involving selections and projections, in addition to joins, is studied. Section 3.3 extends the algorithm by allowing arbitrary target class and arbitrary answer site. 3.1 Minimum total cost without considering reductions and target class It is allowed that different sites may have different processing speeds, and each class can have duplicate copies at different sites. Let P} be the relative processing speed of site/ (If P1 = 1 and P2 = 2, the CPU at site P2 is two times as fast as that at site /\.) We now seek the minimum total cost for having a copy of Cx „ at some site, where total cost is the sum of local processing costs and communication costs involved. Let T{a, X, b) be the communication cost for transferring X amount of data from site a to site b in the local network. If a = b, then T = 0, otherwise, T > 0. Let Cost, ] k be the minimum cost for having a copy of C( t at site k, where Ct }s are denned in Section 1. There are two ways to have a copy of C ( } at site k: • directly construct Cf4 at site k; • first construct Ct} at a site other than k and then transfer a copy of C,j to site k. Let Cost* t k be the minimum cost for having a copy of C\ j at site k. Let dijk be the minimum total cost for constructing Cit at site k. all corresponding Costs for all /eSITESM are computed. In the second iteration, all d13t, d{ 3 (, d2Ai and d^A ( and the corresponding Costs can be similarly computed. In the final iteration, ClA2 as the result (if we ignore target class) is computed, where site S2 is the answer

site.



minimum costs to perform reduction on C( with the resulting class ordered and not necessarily ordered, respectively. Let rjc(C, m, Cm+li) be the minimum cost to perform the reductions on the classes and the join between the classes. As discussed earlier, if C, m and Cm+lj are intermediate results, all reductions must already have been performed and therefore the following is true for all m > i and j > m+\: rjc(CUm, Cm+1J =jc(Cim,

Cm+1J),

m > i and j > m + 1.

(8)

Now we proceed to consider the following three cases: (1) both Cim and Cm+lj are original classes; (2) only Cim is an original class; and (3) only Cm+li is an original class. • Suppose C, m is C, and C m+1 , is Ci+l. Then rcj(Ct, Ci+1) = min {jc (C ; , C j+1 ),

3.2 Local reductions

In this subsection we discuss how to incorporate local reductions (projections and/or selections) into the basic algorithm to achieve the minimum total cost in a distributed environment. We first consider how reductions at one site can be incorporated into our basic algorithm. Clearly, if there are a number of selections/projections to be performed on a class at a given site i, it is cheaper to perform all such reductions at the same time, instead of performing one at a time. Thus, we can assume that there is at most one such reduction per class. Again, the sorted status makes the problem more difficult. Consider an intermediate result Cim,m>i, created during processing a query. We may assume that all reductions on the original classes C(, Ct+1,...,Cm have been performed by the time Cim has been created, because joins are performed to form the intermediate result, and at the time joins are performed it is relatively cheap to perform local reductions on them, if any. Thus, the basic algorithm given in Section 3.1 is still applicable to all intermediate results. Therefore, it is sufficient to consider reductions and joins involving original classes. Specifically, if there is a join between two original classes at the same site, one or both of them may have reductions to be performed; there are four different ways to execute the join and the reductions: • perform reductions on both classes before the join; • perform reduction on one of the classes before the join, and while the join is being performed execute reduction on the other; • same as (2) except that the order is reversed; • perform the join first, and during the join perform the reductions. One reduction method may be more expensive than another but the intermediate result may be in different states (sorted or not). For example, scanning a class in ascending order of UIDs and selecting the objects based on certain criteria is expensive, but the intermediate result obtained is sorted; while performing the selection (in particular when involving inequality) using indices is less expensive but may leave the intermediate result unordered. Having the class ordered could be beneficial in a subsequent join if the class is the second argument in the join operation. Let o_red (C() and /i_red (C,) be the

(9)

n_red(C()+yc(C( ct) = min{-/c(Cu_i, c>)> (11) Finally, let rjc' be the minimum join cost with the resulting class sorted while taking reduction into consideration. The equations for rjc' are similar to those for rjc. We now consider the situation when reductions/join on/between two classes in different sites are to be performed. In this case, we perform reduction on any class before it is transferred to a different site for the join operation. The reason is that it is usual when a class is to be transferred, all its objects are sent from secondary memory to main memory to prepare (get packaged) for the transmission, and it is relatively cheap to perform the reduction while they are in main memory. Furthermore, the reduction lowers the communication cost and the processing cost at the destination site.

THE COMPUTER JOURNAL, VOL. 35, NO. 2, 1992 103 Downloaded from https://academic.oup.com/comjnl/article-abstract/35/2/98/360312 by guest on 02 December 2017

W. SUN, W. MENG AND C. YU

3.3 Arbitrary target class and arbitrary answer site

In this subsection we show how to extend the above approach to allow arbitrary target class at an arbitrary answer site. We first review, in a centralised environment, how to obtain the qualified IDs for the target class from the IDs in Cln. Usually, a user is interested in retrieving the IDs of a given class. The class whose objects are required by the user is called the target class denoted by Ct. In query Qx (see Figure 1), the target class Auto is the leftmost class in the chain. But in general the target class can be any class in the chain. The following query Q2 is used to find all persons who own at least one automobile and live in cities within states with populations over 10000. The target class in this query, Person, is not the leftmost class in the chain. select Person where owner. hometown. state. population > 10000 In earlier discussion, Cln is computed without designating the target class. Assume Cln is obtained from C1 , 'wi+1, a for some m, 1 ^m < n. After Cln[IDm+1] is obtained, we can obtain the IDs for the target class Ct by propagating the result from C1 „ [Wm+1] to C(_! [IDt_v ID,], since the second attribute of C(_j is IDt, i.e. the desired result. Propagation is carried out by a series of joins. For example, when t—\~£m+\, the propagation from C ln [/Z) m+1 ], denoted by C'm+i [IDm+j, to C(_! [/De-i, IDt] consists of joining Cfm+1[Wm+1]

with Cm+l[IDm+1,

IDm+2] to yield an

intermediate result on attribute IDm+2. This is then joined with Cm+2 [Wm+2, IDm+3] to yield an intermediate result on IDm+3. This process is repeated until a join with Ct_x\IDt_x,ID^ is performed to yield the answer. Correctness issue of such propagations will be delayed until the end of this subsection. Let the cost of the propagation from Cx „ [Wm+1] to C(_x [IDt_lt IDt] be P(Cfm+1, C ^ ) . An optimal algorithm that computes the answer while taking into consideration the propagation cost is as follows. Compute Cost( t and Cost?, as in Section 2 for all j—i ^ n — 2. Then at the last stage Cost! „ is redefined as follows. (We can see that if t = m+l,tneanswcrisC iill [/Z) m+1 ]andP(Cj l+1 ,C t _ 1 ) = 0, that is, no propagation is needed in this case.)* Costlf „ =


m+1] from Clm and c m+i,n results in Cm+ln fully reduced, •e. C' is

obtained. In order to compute the answer, C'm+1 is propagated to the target class C, by computing Cfm+1 ® Cm+i® ... ® C(_x[IDt], assuming t-\^m+l without losing generality. Let Gf_t be the minimum total cost to perform C^_x ® C,_! Cj ® ... ® Ct_x [IDt] with a copy of q_ t at site k and with the answer at destination site w. After Cft_r ® C}-\ UD}], a set of object IDs for Cp i.e. a set of ID}s, denoted by Cj, can be obtained. We seek the minimum propagation cost G"m+l, where q is the site that Ct n is obtained from Clm ® Cm+1 „, i.e. GQm+x is the minimum cost to compute C'm+1 ® Cm+1 ® Cm+2 ® ... ® Ct_J/£)J and yield the answer C{[IDt] at answer site w, where Cfm+1 is obtained at site q. The way we compute all such Gxt, m + 1 ^ / < t - 1 and x e SITES, is to compute all G?_x for all xe SITES first, then all Gl2 for all xe SITES, and so on. (?,*_!, for site k, m+ 1 > y - 1 > / - 1, can be obtained from G),te SITES, as follows. Suppose that the original class C}_x is at site u. In general, in order to compute G)_v the following steps are taken. (1) Transfer C{_x from site k to a site v, ye SITES. (2) Transfer C^ from site u to the site v, ye SITES. (3) Join C,_! and C^_x at site v to produce Cj at site v. (4) Transfer C{ from site v to some site x, xe SITES. (5) At site x, Cj is obtained. The minimum propagation cost from Cj to the target class at the answer site is, by definition, G*. To ensure that Steps O H 4 ) yield the minimum cost, site v and site x should be varied over all sites in SITES. Thus the following recursive equation is obtained: G

t,,zeSITES

where T(s,\R\,d) is the cost for transferring \R\ amount of data from site 5 to site d. Clearly, G(w = 0 where Ct is the target class and w is the answer site. We start by computing Gf_x for all site k, k e SITES, which involves computing C(_j ® Ct_x [IDt] where C{_x is at site k. Equation (13) can be applied with Gf = 0 and x is restricted to w only for the first iteration. Although v ranges over all sites in SITES in Equation (13), it is important to observe that it is sufficient, in achieving the minimum, to vary v over the sites k, x, u and the site v0 that has the fastest speed only. Thus, G*_j can be computed in O^). Therefore, G*_x for different values of k can be computed in 0{h\). We start this computation by j = t and progress until j = m + 2. This takes no more than O(n*hl) time. Recall that ht is bounded by O(n). Thus the computation of all G*_j takes O(n2 * nj times. Having computed G»+1,2 < m+ 1 < n,ye SITES, the minimum cost MIN_COST to obtain the answer at site w is

THE COMPUTER JOURNAL, VOL. 35, NO. 2, 1992

Downloaded from https://academic.oup.com/comjnl/article-abstract/35/2/98/360312 by guest on 02 December 2017

min

(13)

• We note that an alternative approach is to associate the UIDs of the target class C, with all the intermediate results C, },i =S / ^j, that is, the scheme for all such Ci}, i =£ /

Suggest Documents