Algorithms and Support for Horizontal Class ... - Springer Link

6 downloads 5221 Views 258KB Size Report
Primary horizontal partitioning of a class is performed using predicates of queries accessing ..... The parameters of this cost model are classified into three cate-.
Distributed and Parallel Databases, 8, 155–179 (2000) c 2000 Kluwer Academic Publishers. Manufactured in The Netherlands. 

Algorithms and Support for Horizontal Class Partitioning in Object-Oriented Databases∗ LADJEL BELLATRECHE [email protected] KAMALAKAR KARLAPALEM [email protected] Department of Computer Science, University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, People’s Republic of China ANA SIMONET TIMC-IMAG Laboratory, Faculty of Medicine, 38706 La Tronche, France

[email protected]

Abstract. Horizontal partitioning is a logical database design technique which facilitates efficient execution of queries by reducing the irrelevant objects accessed. Given a set of most frequently executed queries on a class, the horizontal partitioning generates horizontal class fragments (each of which is a subset of object instances of the class), that meet the queries requirements. There are two types of horizontal class partitioning, namely, primary and derived. Primary horizontal partitioning of a class is performed using predicates of queries accessing the class. Derived horizontal partitioning of a class is the partitioning of a class based on the horizontal partitioning of another class. We present algorithms for both primary and derived horizontal partitioning and discuss some issues in derived horizontal partitioning and present their solutions. There are two important aspects for supporting database operations on a partitioned database, namely, fragment localization for queries and object migration for updates. Fragment localization deals with identifying the horizontal fragments that contribute to the result of the query, and object migration deals with migrating objects from one class fragment to another due to updates. We provide novel solutions to these two problems, and finally we show the utility of horizontal partitioning for query processing. Keywords: object-oriented database, horizontal class partitioning, fragment localization, object migration, performance evaluation

1.

Introduction

Database partitioning enhances performance of applications as it reduces the amount of irrelevant data to be accessed and transferred among different sites in a distributed system. The horizontal partitioning (HP) is the breaking up of a relation (in the relational model)/class (in the object model) into a set of horizontal fragments with only subsets of its tuples/instance objects. The HP of a class into horizontal class fragments (HCFs) also permits concurrent processing since queries can access multiple HCFs of the class simultaneously [13]. The problem of partitioning in the distributed relational database systems has been recognized for its impact on the performance of the system as a whole [11, 28, 30]. The object-oriented database (OODB) environment based on an object-oriented data model is built around the fundamental concept of an object and includes features such as ∗A

preliminary version of this work appeared in DEXA’97.

156

BELLATRECHE, KARLAPALEM AND SIMONET

encapsulation, inheritance, and class composition hierarchy. For these reasons we cannot directly apply the HP solutions developed for the relational model [28, 30] to the object oriented data model. There are three major aspects which need to be addressed in horizontal class partitioning (HCP). One is the design of algorithms for optimal HCP. It is well known that generating optimal HCP is very difficult, and that heuristics need to be employed to provide a reasonable and intuitive solution [11, 13, 27]. Second aspect is the support for executing queries on a horizontally partitioned OODB system. There are techniques developed [30] which use query decomposer and data localization for identifying the HCFs that contribute to the query result for distributed relational database systems. Similar techniques and algorithms need to be developed for horizontally partitioned OODB system for identifying the HCFs that contribute to the result of a query, and third one is the utility of HCP in OODBs in term of reducing the number of irrelevant disk accesses. 1.1.

Related work

Several researchers have worked on HP in relational databases (RDBs), including [11, 30]. Ceri et al. [11] showed that the main optimization parameter needed for HP is the number of accesses performed by the application programs to different portions of data. They specified applications data requirements for HP in terms of a set of simple predicates and used access pattern information to achieve the HP. A set of simple predicates is complete if, and only if, the probability of accessing any two records belonging to a minterm fragment is the same (defined in [11]). A complete set of predicates is minimal if, and only if, all its elements are relevant. Two types of HP are distinguished in [11]: primary and derived HP. A primary HP of a relation is performed using the predicates that are defined on that relation. The derived HP of a relation is not based on the predicates defined on its own attributes, but is ¨ derived from the HP of another relation. Ozsu et al. [30] specified the database information needed for HP of a relation, and developed a primary algorithm for HP whose the steps are as follows: 1. Determine the set of complete and minimal predicates P = { p1 , p2 , . . . , pn } [11]. 2. Determine the set of minterm predicates M of P which is defined as follow: M = {m i | m i = ∧ p j ∈P p ∗j }, where p ∗j = p j or p ∗j = ¬ p j (1 ≤ j ≤ n), (1 ≤ i ≤ 2n ) 3. Eliminate the contradictory minterm predicates using the predicate implications. We note that this approach can generate 2n horizontal fragments for n simple predicates. Ezeife and Barker [13] presented a comprehensive set of algorithms for horizontally partitioning in OODBs of four class models: classes with simple attributes and methods, classes with complex attributes and simple methods, classes with simple attributes and complex methods and classes with complex attributes and methods. The authors defined a primary HCP as the partitioning of a class based on queries accessing only one class. ¨ Ezeife and Barker [13] used the same algorithm defined by Ozsu and Valduriez [30] with the same simple predicates. However, the number of minterms generated by this algorithm is exponential to the number of simple predicates. In [7], we presented a preliminary work for a primary algorithm which is an extension of the algorithm proposed by [27], but we

HORIZONTAL CLASS PARTITIONING

157

did not consider in detail the derived HCP (DHCP), issues and problems related to it, HCF localization, object migration and performance of the primary and derived algorithms. 1.2.

Contributions and organization of the paper

This paper addresses the problem of HCP in OODBs, and provides primary and derived horizontal partitioning algorithms for horizontally partitioning a class. Ezeife and Barker [13] proposed a primary algorithm, but this algorithm has two major drawbacks. The first one is that the queries which the authors considered contained only simple predicates. The second one is that the authors used the same algorithm developed by Ceri et al. [11] with simple predicates. However, the number of minterms (potential horizontal fragments) generated by this algorithm is exponential to the number of simple predicates. Our primary algorithm (PA) considers queries which contain either simple and component predicates. The HCFs generated by the PA are based on a graph theoretic algorithm which clusters a set of predicates into a set of HCFs [28]. A major feature of this algorithm is that all HCFs are generated in one iteration at a time O(l ∗ n 2 ), where l and n represent the number of queries and the number of predicates, respectively, and is more efficient than the algorithm developed in [13]. Further, we discuss issues in DHCP and present solutions. We also address two important aspects for supporting OODB operations on a horizontally partitioned OODB, namely, HCF localization for queries, and object migration for updates that were ignored by [13]. The HCF localization deals with identifying the HCFs that contribute to the result of the query, and object migration deals with migrating objects that move from one HCF to another due to updates on objects. We provide solutions to HCF localization and object migration. Another drawback with the previous work done in RDBs and OODBs [11, 13, 27, 30] is that they don’t provide a quantitative metric to show the utility of HCP, and under which circumstances the primary and DHCP provides a good performance for query execution. The main contributions of this paper include: 1. Addressing the primary and the derived horizontal partitioning of an OODB based on query access patterns. 2. Developing two algorithms (primary and derived) to achieve the HCP. 3. Evaluating and presenting solutions for two problems generated by DHCP, namely, multiple primary HCF candidates for a class to be derived horizontally partitioned, and multiple derived HCF candidates for a given primary horizontally partitioned class. 4. Presenting algorithms for HCF localization for queries and object migration for updates. 5. Presenting an overview of our evaluation about the utility of the primary and derived HCP using a cost model. The rest of the paper is organized as follows. In Section 2, some definitions used in specifying the problem of HCP, and the primary algorithm and its performance are described. Section 3 presents the DHCP algorithm and its performance, Section 4 problems due to DHCP and their solutions are provided, Section 5, the HCF localization and the object migration problems are addressed. Finally, in Section 6 our contribution and outline for future work is summarized.

158 2. 2.1.

BELLATRECHE, KARLAPALEM AND SIMONET

Horizontal class partitioning A core object oriented data model

In this subsection, we briefly review a collection of concepts that are essential to an OODB. We focus on the basic concepts that are mandatory and/or common to most OODB models and systems [1, 24]. These elementary concepts also form the core of the OODB model which we shall assume for our paper. The model we have assumed is built around the fundamental concept of an object. An object represents a real entity and it is an abstraction defined by (a) a unique object identifier (OID) which remains invariable during the life of the object, this OID is used as a means of referencing an object by other object(s), (b) a set of attribute values which define the state of the object, and (c) an interface consisting of encapsulated methods which manipulate the object. Objects having the same attributes and methods are grouped into a class. An instance of a class is an object with an OID and has a domain for each of its attributes. Classes are organized into an inheritance hierarchy by using the specialization property (isa), in which, a subclass inherits the attributes and methods defined in the superclass(es). The database contains a root class which is an ancestor of every other class in the database. A class in an OODB is represented by a set of attributes A and a set of methods M. For each attribute, the set of values it may have is defined by its domain. Two types of attributes are possible [2]: a simple attribute can only have an atomic domain (such as, integer, string, etc.), and a complex attribute has a class as its domain. Thus, there is a hierarchy which arises from the aggregation relationship among classes through their complex attributes. This hierarchy is known as class composition hierarchy which is a rooted directed graph (RDG) where the nodes are the classes, and an arc exists between pair of classes Ci and C j if C j is the domain of an attribute of Ci . Figure 1 shows a class composition hierarchy that will be used in the remainder of the paper. Moreover, attributes can be single valued or multi-valued. The multi-valued attributes are defined by using constructors (Const) such as set, list, tree, and

Figure 1.

The class composition hierarchy for the class employee.

HORIZONTAL CLASS PARTITIONING

159

array [10]. The methods have a signature including the method’s name, a list of parameters, and a list of return values which can be an atomic value or an OID. 2.2.

Basic concepts & definitions

Before describing the HCP algorithms, some definitions are introduced below. Definition 1 (Simple Predicate [11]). A simple predicate is a predicate defined on a simple attribute or a method and it is defined as: attribute-method operator value, where operator is a comparison operator (=, , ≥, =). The value is chosen from the domain of the attribute or from the domain of values returned by the method. Definition 2 (Path [10]). A path P represents a branch in a class composition hierarchy and it is specified by C1 · A1 · A2 · · · An (n ≥ 1) where: • C1 is a class in the database schema • A1 is an attribute of class C1 • Ai is an attribute of class Ci such that Ci is the domain of the attribute Ai−1 of class Ci−1 , (1 < i ≤ n). From the last class Cn in the path, either an attribute An can be accessed, or a method can be invoked. The length of the path P is defined by the number of attributes, n, in P. A predicate defined on a path is called component predicate [7]. We consider a typical object query language (OQL): SELECT “result list” FROM “target class” WHERE “qualification clause” The “qualification clause” defines a boolean combination of predicates by using the logical connectives: ∧, ∨, ¬. A query that involves only simple predicates is called a simple query [2]. Example 1. Let Cost( ) be a method whose signature is Cost( ): real, be defined on the class Project of figure 1 giving the cost of a project. A query that retrieves the names of all projects that have a cost greater than $7000 is expressed as follows: SELECT Pname FROM p Project WHERE p.Cost( ) >$7000

% simple predicate

A query that involves component predicate(s) will be called a component query [2]. Example 2. A component query which retrieves the names of all employees which work on “Database” project is formulated as follows:

160

BELLATRECHE, KARLAPALEM AND SIMONET

Figure 2.

Fan-out(Ci , ak , Ci+1 ) & Share(Ci , al , Ci+1 ).

SELECT Ename FROM e Employee WHERE e.Dept.Proj.Pname = “Database”

% component predicate

Definition 3. Access frequency of a query [13] is the number of accesses a query makes to data during a specific period. Data in this context can be a class, a HCF, an object instance. Definition 4 (Fan-out [9] & Share [21]). Let two classes Ci and Ci+1 in a path P (Ci+1 is the domain of an attribute ak of Ci ). The fan-out between two classes Ci and Ci+1 namely fan(Ci , ak , Ci+1 ) is defined as the average number objects of Ci+1 that are referenced by an object of Ci through attribute ak as shown in figure 2. Similarly, the sharing level, the share between two classes Ci and Ci+1 namely share(Ci , al , Ci+1 ) defined as the average number objects of Ci that reference the same object in Ci+1 through attribute al . 2.3.

Primary horizontal class partitioning algorithm

The primary algorithm (PA) of a class C is performed using simple and component predicates of a set of queries accessing this class. PA has as input, a set of queries, Q = {q1 , q2 , . . . , ql }, and their access frequencies. These queries respect the 80/20 rule, that is 20% of user queries account for 80% of the total data access in a database system [27]. The basic concept of our PA is the “affinity between predicates”; predicates having a high affinity are grouped together to form a HCF. The PA starts by performing an analysis on the predicates from a set of queries Q. From these queries, we define the use parameter as: use(qh , pi ) = 1 to mean that the predicate pi is used by the query qh , 0 otherwise. The use values are used to generate a predicate usage matrix and then a predicate affinity matrix, where each value ( pi , pi  ) represents the sum of access frequencies of all queries that access predicates pi and pi  together. After that, we apply the graph-based algorithm [27] that consider predicate affinity matrix as a complete graph with each node representing a predicate and an edge between two nodes representing an affinity between these nodes. After that, a linearly connected spanning tree is formed resulting in a set of clusters of nodes, forming sets of predicates. After this step, we optimize each set of predicates (if possible) by using implications between the predicates. Finally, we generate the HCFs, where each HCF is defined by a boolean combination of predicates using the logical connectives (∧, ∨). In order to illustrate the working of the steps in the algorithm, use the following example. In figure 1, we show the class composition hierarchy of a we class Employee that has

161

HORIZONTAL CLASS PARTITIONING Table 1.

A set of queries for the running example. A set of queries accessing on class Project

q1 :

Select PId

Freq 20

From p Project Where p.Duration ≤ 3 ∧ p.Cost() > 7000 q2 :

Select Pname

35

From p Project Where p.Duration ≤ 4 ∧ d.Cost() ≤ 7000 q3 :

Select Dname

30

From d Department Where d.Proj.Duration = 2 ∧ d.Proj.Cost() ≤ 7000 ∧ d.Proj.Location = “CA” q4 :

Select Esalary

15

From e Employee Where 5 ≤ e.Dept.Proj.Duration ≤ 6 ∧ d.Dept.Proj.Cost() ≤ 7000

two complex attributes: Dept, and Own, and three simple attributes: EmpId, Ename and Esalary representing the identifier, the name and the salary of an employee, respectively. The Project class will be horizontally partitioned based on the set of queries (simple and component) accessing this class as shown in Table 1. The column Freq shows the frequency of the queries. The details of this algorithm is given below. The steps of the Primary Alogrithm Inputs: The class C to be horizontally partitioned, and set of queries with their frequencies. Step 0: Determine the set of predicates Pr used by queries defined on the class C. These predicates are defined on a subset of attributes A (A ⊆ A) and a subset of methods M  (M  ⊆ M). We call each element of A and M  relevant predicate attribute and relevant predicate method, respectively. Let n, r , and u be the cardinality of Pr , the cardinality of A and the cardinality of M  , respectively. Example 3. are:

The predicates used by the queries {q1 , q2 , q3 , q4 } of our running example

1. q1 -( p1 ): 2. q2 -( p2 ): 3. q3 -( p3 ): = “CA” 4. q4 -( p4 ):

p.Duration ≤ 3, ( p5 ): p.Cost( ) > $7000 p.Duration ≤ 4, ( p5 ): p.Cost( ) > $7000 d.Proj.Duration = 2, ( p6 ): d.Dept.Proj.Cost( ) ≤ $7000, ( p7 ): d.Proj.Location 5 ≤ Duration ≤ 6, ( p6 ): d.Dept.Proj.Cost( ) ≤ $7000

There are two attributes used in these predicates: Duration and Location which will be renamed as a1 and a2 , respectively, and one method: Cost( ) which we will renamed as m.

162

Figure 3.

BELLATRECHE, KARLAPALEM AND SIMONET

Predicate usage matrix.

Step 1: Build the predicate usage matrix(PUM) of class C. This matrix contains queries as rows and predicates as columns. The (h, i)th (1 ≤ h ≤ l, 1 ≤ i ≤ n) value of this matrix equals 1 if a query qh uses a predicate pi ; else it is 0. Example (Continued). The PUM of our running example is shown in figure 3. The element (1, 1) of PUM is 1 (see figure 3), because q1 uses the predicate p1 . Step 2: Construct the predicate affinity matrix (PAM) where each value aff ( pi , pi  ) (1 ≤ i, i  ≤ n) between two predicates pi and pi  can be: Numeric, representing the sum of the frequencies of queries which access simultaneously the predicates pi and pi  , or Nonnumeric, where the value “⇒” of aff ( pi , pi  ) means that predicate pi implies predicate pi  , value “⇐” means that predicate pi  implies predicate pi , and the value “*” means that two predicates pi and pi  are “similar” in that both are used jointly with a predicate pi  . That is, there exists a query q which use predicates pi and pi  and another query q  which use predicates pi  and pi  . This information is provided by the database designer. Example (Continued). We now construct the PAM of our running example. For example, the value ( p1 , p5 ) indicates that the objects that satisfy predicates p1 and p5 are accessed by the queries (in this case, only query q1 ) with frequency 20. This matrix is shown in figure 4. Note that p1 ⇒ p2 , p3 ⇒ p1 , p3 ⇒ p2 , p1 and p2 are similar, and p3 and p4 are similar. Step 3: In this step, we apply the algorithm developed in [27] to group the predicates to form clusters where the predicates in each cluster demonstrate high affinity to one another. This algorithm starts from the predicate affinity matrix by considering it as a complete graph called the “affinity graph” in which an edge value represents the affinity between the two predicates, and then forms a linearly connected spanning tree. A linearly connected spanning tree is a tree that is constructed by including one edge at a time, such that only

Figure 4.

Predicate affinity matrix.

HORIZONTAL CLASS PARTITIONING

Figure 5.

163

Predicate sets generated by graphical technique.

edges at the first and the last node of the tree would be considered for extension. We then form affinity cycles in this spanning tree by including the edges of high affinity value around the nodes, and extending these cycles as much as possible. After the cycles are formed, partitions are easily generated by cutting the cycles apart along cut edges. The partitions of the graph generate a set of subsets P = {P1 , P2 , . . . , Pγ } of predicates. Example (Continued). After applying Step 3 of the PA to this matrix we get three partitions of predicates P1 = { p1 , p2 , p5 }, P2 = { p3 , p6 , p7 }, P3 = { p4 } as shown in figure 5. Step 4: In this step, we optimize the predicates contained in each subset by using predicate implication got from the predicate affinity matrix. This reduces number of predicates in each subset obtaining a set of subsets P  = {P1 , P2 , . . . , Pα } (α ≤ γ ) of optimized predicates that will be used in the next step. Example (Continued). The subset P1 will be refined into P1 = { p2 , p5 } because p1 ⇒ p2 . After Step 4, we obtain three subsets: P1 = { p2 , p5 }, P2 = { p3 , p6 , p7 }, P3 = { p4 }. Step 5: Note that the subsets of predicates resulting from the Step 4 may not cover all relevant attributes and methods. For example the subset P3 , contains only one predicate p4 which is defined on the attribute Duration. Then, for each element Pi of P  resulting from the Step 4, we enumerate the attributes and the methods V i which are not used by the predicates in Pi . Based on the cardinality of V i (|V i |), we have two cases: 1. |V i | = 0 means that Pi covers all relevant attributes and methods. 2. Otherwise (|V i | = 0), then for each element v ij of V i , we enumerate the number of predicates { p j1 , p j2 , . . . , p jwij } (with wij as cardinality) defined on it. After that, we split Pi into wij subsets Pik (1 ≤ k ≤ wij ), where each Pik contains the predicate(s) of Pi plus p jk . We repeat this step until each subset Pi uses all the relevant attributes and methods. To do so, we build the attribute-method usage matrix (AMUM) from P  which is an (α × (r + u)) matrix containing the elements of P  as rows and the elements of A ∪ M  as columns. Each value (i, j) of this matrix is equal to 1 if attribute or method ∈ A ∪ M  is used by a predicate in Pi ; otherwise it is 0. Example (Continued). In our running example, we have two relevant attributes (see Step 0): Duration (a1 ) and Location (a2 ), and one relevant method Cost( )(m). The subsets

164

Figure 6.

BELLATRECHE, KARLAPALEM AND SIMONET

Attribute-method usage matrix.

of predicates resulting from the Step 4 are three (P1 , P2 , P3 ). The AMUM then contains three rows and three columns as shown in figure 6. Note that P1 contains two predicates: p2 and p5 that are defined on a1 and m, therefore, the values of (P1 , a1 ) and (P1 , m) are set to 1. Note that P2 uses all the relevant attributes and methods (case 1), but the subset P1 does not use a2 . In this case, we determine the predicates defined on a2 . There is only one predicate ( p7 : Location = “CA”), therefore, we split P1 into P11 by adding the predicate p7 , and then we obtain P11 = { p2 , p5 , p7 }. Similarly, the subset P3 will be split into: P31 = { p4 , p5 , p7 }, P32 = { p4 , p6 , p7 }. Finally, four subsets of predicates are obtained: P11 = { p2 , p5 , p7 }, P2 = { p3 , p6 , p7 }, P31 = { p4 , p5 , p7 } and P32 = { p4 , p6 , p7 }. Step 6: The final HCFs will be defined by the subsets of predicates resulting from the Step 5. If the predicates in each subset Pik refer to the same attribute or method we link them by an OR connector, otherwise we use an AND connector. The final number of HCFs will be equal to the number of HCFs obtained in Step 5, plus one which is the negation of the disjunction of all predicates defining the HCFs. Step 7: Our algorithm may give rise to overlapping HCFs. This step consists of refining these HCFs in order to obtain non overlapping HCFs. Note that overlapping HCFs have some predicates that are defined on the same attribute or method. We first determine these predicates, and combine these overlapping HCFs into a new HCF, where the predicate corresponding to an attribute(or method) covers the domains of all overlapping predicates. Example (Continued).

Finally, from Steps 6 and 7 the HCFs generated are:

Project1 with clause cl1 : (Duration ≤ 4) ∧ (Cost( ) > 7000) ∧ (Location = “CA”). Project2 with clause cl2 : (Duration = 2) ∧ (Cost( ) ≤ 7000) ∧ (Location = “CA”). Project3 with clause cl3 : (5 ≤ Duration ≤ 6) ∧ (Cost( ) > 7000) ∧ (Location = “CA”). Project4 with clause cl4 : (5 ≤ Duration   ≤ 6)∧ (Cost( ) ≤ 7000) ∧ (Location = “CA”). Project5 with clause cl5 : ¬(cl1 cl2 cl3 cl4 ). Note that Project5 represents the negation of the disjunction of all clauses previously defined. Based on the HCFs of class Project, we can identify the HCFs needed by each query in figure 1. The query q1 accesses only the HCF Project1 , query q2 accesses only the HCF Project2 , query q3 accesses only the HCF Project3 , and query q4 accesses only the HCF Project4 . The HCF of class Project are represented using the scheme developed in [18] and is illustrated in figure 7.

HORIZONTAL CLASS PARTITIONING

Figure 7.

165

Representation of the primary horizontal class fragments.

The complexity of the primary algorithm is from Steps 2 and 5, and that is O(l ∗ n 2 ) and O(α ∗ (r + u)), respectively. We note that n, l, α, r , and u represent the number of predicates, number of queries, number of HCFs, number of attributes used by the queries, and the number of methods used by queries, respectively. Therefore, the complexity of our algorithm is: O[l × n 2 + α ∗ (r + u)]. 2.4.

Evaluation of the primary horizontal partitioning

In [4, 5], we have developed an analytical cost model for executing a set of l queries {q1 , q2 , . . . , ql } that obey the 80/20 rule [27], and contain simple and/or component predicates. The cost for executing the queries on horizontally partitioned class(es) was developed. The objective of our cost model is to calculate the cost of executing these queries. The cost of executing a query is directly proportional to the number of pages it accesses. The total cost is estimated in terms of disk page accesses. Our cost model considers only the IO Cost that represents the input/output cost for reading and writing data between main memory and disk. This is because for very large database applications with large number of data accesses, the CPU Cost (representing the cost of executing CPU instructions) contribution towards the Total Cost will not be significant [4], and the OODB we consider is centralized, therefore the COM Cost (representing the cost of network communication among different nodes) is non-existent. The parameters of this cost model are classified into three categories: database (such as, cardinality of the root class, object length(LC), page size of the system file(PS), Fan-out, Share), query (such as, selectivity, length of output result, number of queries, frequency, predicates used by these queries) and HCFs generated by the primary HCP (such as, number of HCFs, cardinality of each HCF). The complete details of the cost model are given in [4, 5]. The input parameter values for C, LC, PS, Share, Fan-out, number of queries and their frequencies can be obtained in several ways [8]. Either they can be provided by the database

166

BELLATRECHE, KARLAPALEM AND SIMONET

administrator, or can be automatically acquired by the system through data inspection or sampling [17, 25]. The selectivity can be estimated by using the techniques described in [8, 12]. The cardinality of HCF F j of a class Ci , is calculated as: F j  = sel(F j ) × Ci  for all 1 ≤ j ≤ α. Based on the cardinality of the root class, the cardinality of other classes (in the class composition hierarchy) can be computed based the following formula: Let root class be C1 . For each class C j such that there is a path from C1 to C j of length n j , then cardinality of C j is given by the Eq. (1). C j  =

n j −1 i=1

Fan(Ci , Ci+1 ) × C1  Share(Ci+1 , Ci )

(1)

In order to show the utility of HCP, we conducted some analytical experiments to see the effect of fan-out, and the cardinality on the improvement of performance due to HCP. This improvement of performance is characterized by the normalized IO metric as follows: Normalized IO =

# Of IOs for the horizontally partitioned class collection # Of IOs for the unpartitioned class collection

Note that the value of normalized IO less than 1.0 implies HCP is beneficial. We have conducted many experiments [4, 5] to study the effect of the varying the Fan-out between classes in a class composition hierarchy and the effect of the varying the cardinality of a class on the performance. The results showed that the performance gain increases with the increase in fan-out. Thus, high fan-out has a considerable benefit for HCP. We note that the normalized IO is very low when all three classes in the class composition hierarchy are horizontally partitioned. The Normalized IO is independent of cardinality of the root class. That is, irrespective of the cardinality of the class we get the same percentage of reduction in number of disk accesses. The primary HCP may deteriorate the performance of some queries due to the overhead that is due to the cost of applying the union operation in order to reconstruct the result of a query. 3.

Derived horizontal class partitioning

Let q be a query defined in OQL as follows: q: SELECT Ename FROM e Employee WHERE e.Dept.Proj.Location = “CA” This query is called backward query [22] because it involves traversing the path (e.Dept. Proj.Location) backwards from the projects located at “CA” to employee instances. These types are very costly to evaluate in the absence of suitable access structures. As most of queries access multiple objects along a path in a class composition hierarchy. Even though indexing [22] and clustering [14] can help provide a good access support at the physical level, the number of irrelevant objects retrieved during query processing can still be high.

HORIZONTAL CLASS PARTITIONING

167

Example 4. We present an example to give a progressive overview on how DHCP technique can reduce the cost of query execution. Let class Project in figure 1 be horizontally partitioned into two HCFs based on the attribute “Location”. Let HCF Proj1 be a HCF with all those projects that are located in “CA”, and let Proj2 be a HCF with all the projects located in “LA”. Assume that the class Employee has been partitioned based on the HCP of class Project into two HCFs: Emp1 and Emp2 such as Emp1 and Emp2 represent all employees working in a project located in “CA” and “LA”, respectively. Assume that we have to execute the query q specified above. We note that this query accesses only the HCF Proj1 and Emp1 , and it does not need to access Proj2 and Emp2 , because these two HCFs do not contribute to the answer of q. The HCP of class Employee based on the HCP of class Project, is known as derived horizontal class partitioning of class Employee. Let Ci be the class to be derived horizontally partitioned based on the class C j which is horizontally partitioned using the primary algorithm into α HCFs {PF 1 , PF 2 , . . . , PF α }, where every HCF PF j (1 ≤ j ≤ α) is defined by a qualification clause cl j . We notice that there is a path P from the class Ci to the class C j where: P = Ci · A1 · A2 · · · Ak , where Ak is either an attribute or a method of class C j . By definition, a derived HCF DF j (1 ≤ j ≤ α) of the class Ci is defined as: all instances in the class Ci satisfying the qualification clause of the HCF PF j (1 ≤ j ≤ α) of the class C j . Example 5. Let Employee class be derived horizontally partitioned based on the HCFs of the Project class. Let Projecti (1 ≤ i ≤ 5) and Emp j (1 ≤ j ≤ 5) be respectively, the project and employee HCFs. Considering the HCF Project1 which is defined by the following clause: (Duration ≤ 4) ∧ (Cost( ) > 7000) ∧ (Location = “CA”). This clause contains three predicates p11 : Duration ≤ 4, p12 : Cost( ) > 7000 and p13 : Location = “CA”, then the HCF Emp1 of class Employee is defined as: Emp1 = σ(E1 ∧E2 ∧E3 ) (Employee), where: E 1 = (Employee.Dept.Proj.Duration ≤ 4), E 2 = (Employee.Dept.Proj.Cost( ) > $7000), and E 3 = (Employee.Dept.Proj.Location = “CA”). σ is the selection operation [23] which retrieves object instances of the Employee satisfying a qualification clause E 1 ∧ E 2 ∧ E 3 . Similarly, we can express other HCFs of Employee, i.e., Emp2 , Emp3 , Emp4 , Emp5 . The derived HCFs satisfy properties of completeness and reconstruction, but not the property of disjointedness. We recall these three properties below: • Completeness: A class C is horizontally partitioned into a set of HCFs F1 , F2 , . . . , Fα that is complete if and only if every object instance belongs to at least one HCF Fi (1 ≤ i ≤ α). • Reconstruction: The union of all HCFs reproduce the original class. • Disjointedness: Every instance object belongs to only one HCF.

168

BELLATRECHE, KARLAPALEM AND SIMONET

Figure 8.

3.1.

Non-overlap algorithm.

Non overlapping derived horizontal class fragments

The HCFs defined by the DHCP may not overlap and thus may not satisfy the disjointness property. For example, let us assume that the class Project in figure 1 has been horizontally partitioned into two HCFs: Proj1 and Proj2 which represent all projects located in “CA” and “LA”, respectively. Then the class Employee can be derived horizontally partitioned into Emp1 and Emp2 which correspond to those employees who work on projects located in “CA” and “LA”, respectively. Consider an employee ei works on more than one project located in different locations (“CA” and “LA”), i.e., fan-out(Employee, Project) is greater 1. In this case, the employee ei will be in both Emp1 and Emp2 . That is, HCFs Emp1 and Emp2 are not disjoint. We call the employee ei an overlapping instance. We now present an algorithm called non overlap shown in figure 8. The inputs for this algorithm is the class Ci that is derived horizontal partitioned based on the HCFs of a class C j . In order to know the overlapping instances of the class Ci , we check whether the fan-out(Ci , C j ) of each object instance of class Ci is greater than 1. To do that, for each instance, we inspect the class composition hierarchy for the type of its complex attributes; if all the complex attributes are single valued, then fan-out(Ci , C j ) is 1. Otherwise, even if there is one complex attribute that is multi-valued, the fan-out(Ci , C j ) could be greater than 1. Note that rigorous checking of whether two derived HCFs are overlapping or not would require examining each object instance of class Ci . In case, there is a multi-valued object-based instance variable, we execute the non-overlap algorithm. The idea behind this algorithm is to enumerate all HCF containing a given overlapping instance oi . After that, we calculate the access frequencies of queries to these HCFs. Finally, we pick the HCF that has the maximum access frequency to contain the overlapping instance oi , and we eliminate oi from other HCFs. The non-overlap algorithm is described in figure 8. Before introducing the algorithm, we define for each overlapping instance oi the set belong(oi) that represents the set of HCFs that contain oi . We note that this algorithm ensures the disjointness property. In order to facilitate the maintenance of the database, we create a table with three columns: the first column contains the OID(s) of the instance(s) eliminated by non-overlap algorithm, the second contains the HCF which includes these instances, and the third column contains the name of the HCF(s) from which these instances were eliminated. We call this table overlap catalog. In our

HORIZONTAL CLASS PARTITIONING

169

example, let employee ei of Emp2 be eliminated by the non-overlap algorithm, then this catalog will have the following instance structure: OID(ei ), Emp1 , Emp2 . The overlap catalog facilitates the identification and management of objects which belong to more than one HCF. In case of query processing, for any retrieval, the overlap catalog has to be checked to see if the HCFs accessed do not have any relevant overlapping object instances. Otherwise, queries may give incomplete results. Since each object resides in only one class updates need not be propagated. We advocate this solution for non-overlapping instances because many systems such as ORION [3], GemStone [26] use this solution for applications that involve many instances (i.e., data-intensive applications). The basic reason is that due to data redundancy by allowing instances to belong to more than one class, both storage and update costs increase. 3.2.

Evaluation of derived horizontal partitioning

In order to show the utility of DHCP, we conducted some analytical experiments [6] to evaluate the effect of the fan-out, and the cardinality on the improvement of performance due to DHCP. This improvement of performance is characterized by the normalized IO metric defined in Section 2. To present interesting results, while maintaining a control over the number of parameters needed to study the impact of parameter changes, we consider the following parameters: cardinality of root class (class Employee), page size (PS), number of objects per page (LC), number of HCFs per class, Fan-out of classes along the class composition hierarchy (Employee → Department → Project), and the selectivity of predicate. We concentrate on the following cases of DHCP:

r 0D- only the class Project is horizontally partitioned using the primary algorithm. r 1D- class Department is derived horizontally partitioned based on the HCFs of the class Project.

r 2D- both classes Employee and Department are derived horizontally partitioned based on the HCFs of the class Project. The reason for selecting such a class collection is that it enables us to study both the impact of DHCP on the performance of path execution, and also the effect of the fan-out along the class composition hierarchy. The results [6] show that DHCP facilitates higher performance than unpartitioned classes, under different database sizes, and different fan-outs among class composition hierarchy. Further, we categorized the queries into different types to quantify the exact benefit to each type of query. For example we can have six different types of queries on schema shown in figure 1, namely queries that access only class Employee (type 1), only class Employee & Department (type 2), all 3 classes (type 3), only class Department (type 4), classes Department & Project (type 5), and only class Project (type 6). Our results show that DHCP benefits most queries which access all the classes of the class composition hierarchy (type 3), but can provide worse performance for queries which access only one class (e.g., Type 4). Thus, DHCP is a technique though useful, must be applied with a caution by taking into consideration query mix in an application processing environment.

170 4.

BELLATRECHE, KARLAPALEM AND SIMONET

Aspects of derived horizontal partitioning

There are two potential complications that need to be addressed for the DHCP, namely multiple primary horizontal partitiong candidates and multiple derived horizontal partitioning candidates. We call the class that is horizontally partitioned by primary algorithm the member class, and the class which will be derived horizontally partitioned as the owner class. 4.1.

Multiple primary horizontal partitiong candidates

Let Ci be a class in a class composition hierarchy to be derived horizontally partitioned. We assume that there are m(m > 1) member classes (that are already horizontally partitioned). In this case, there are m possible ways to derive horizontally partition the owner class Ci as shown in figure 9. For example, let the classes Project and Company be member classes, and suppose we want to perform derived horizontal partitioning of the class Employee (see figure 1). In this case, the question that we can ask is: how to select a member class among m classes (Company and Project classes in our example) to be a good candidate for the DHCP of the class Ci (Employee class in our example)?. We present here four ways to select possible candidates, namely, share candidate, frequency candidate, number of HCFs candidate and fusion candidate. • Share candidate: For each member class C j (1 ≤ j ≤ m), we compute Share(Ci , C j ). The best candidate is the member class that satisfies the minimum share with the class Ci . Suppose that the classes Project and Company in figure 1 are horizontally partitioned, and we want to derived horizontally partition the class Employee. First, we calculate the share(Employee, Project) and share(Employee, Company), and after that we take the member class corresponding to the minimum share (say, Project), and then the Employee class will be derived horizontally partitioned based on the class Project. The intuition in

Figure 9.

Multiple primary horizontal fragment candidates.

HORIZONTAL CLASS PARTITIONING

171

selecting the member class having a minimum share is that we can reduce the number of overlapping instances belonging in two or more derived HCFs of the class Ci . • Frequency candidate: Suppose that there are m member classes as candidates for DHCP of a class Ci . Note that each member class C j (1 ≤ j ≤ m) has been horizontally partitioned based on a set of queries Q j (1 ≤ j ≤ m).1 For selecting the member class to be used for DHCP, we calculate the sum of access frequencies of its queries on which it was partitioned for every member class C j . Finally, we select the member class that has the maximum frequency. • Number of Fragments Solution: The choice of a member class used in the partitioning of an owner class can be a decision problem addressed during the process of allocation. In this case, the owner class can be derived partitioned based on the member class with the smallest number of HCFs. This solution allows us to reduce the complexity of the allocation process. • Fusion Solution: Let m be the number of member classes. Let n j (1 ≤ j ≤ m) be the number of HCFs for each member class C j . The idea of this solution is to first horizontally partition the owner class Ci into n 1 ∗ n 2 ∗ · · · ∗ n m HCFs. The HCFs of the class Ci is accessed by a set of queries. In this case, we merge all these HCFs which are accessed by a set of queries together to reduce the number of HCFs of the class Ci as shown in figure 10. Example 6. Let class Company be horizontally partitioned into two HCFs, namely, Comp1 and Comp2 which represent all companies that manufacture cars and trucks, respectively. We assume that class Project has been horizontally partitioned into five HCFs (see Section 2).

Figure 10.

Fusion solution.

172

BELLATRECHE, KARLAPALEM AND SIMONET

We can derived horizontally partition the class Employee based on HCP of both class Company and class Project into 5 ∗ 2 = 10 HCFs: Emp1 , Emp2 , Emp3 , . . . , Emp10 . The derived HCFs of class Employee are represented by as follows. Emp1 : σ(E1 ∧E2 ∧E3 ∧E4 ) (Employee), where: E 1 : Employee.Dept.Proj.Duration ≤ 4, E 2 : Employee.Dept.Proj.Cost( ) > $7000, E 3 : Employee.Dept.Proj.Location = “CA”, and E 4 : Employee.Own.Manufact.Name = “Cars”. Emp2 : σ(E1 ∧E2 ∧E3 ∧E4 ) (Employee),

where: E 1 : 5 ≤ Employee.Dept.Proj.Duration ≤ 6, E 2 : Employee.Dept.Proj.Cost( ) > 7000, E 3 : Employee.Dept.Proj.Location = “CA”, and E 4 : Employee.Own.Manufact.Name = “Cars”. Similarly, we can represent the other Employee HCFs. If the HCFs Emp1 and Emp2 are most commonly used by the same queries, then they will be merged to Emp12 which will be defined as follows: Emp12 = σ( 1 ∧ 2 ∧ 3 ∧ 4 ) (Employee), where: 1 : 4 ≤ Employee.Dept.Proj.Duration ≤ 6, 2 : Employee.Dept.Proj.Cost() > $7000, 3 : Employee.Dept.Proj.Location = “CA”, and 4 : Employee.Own.Manufact.Name = “Cars” The advantages and disadvantages of the above solutions are listed in figure 11. The share solution gives us a good decision on the class to be derived horizontally partitioned, but it is very difficult to estimate the value of share [8]. The frequency solution is simple to implement and can ameliorate the performance of the system, but if there are some changes in the frequency of queries, or addition/deletion of queries, then repartitioning of the class may be needed [20]. The merge solution is more costly than the above three solutions, because it requires taking into consideration n 1 ∗ n 2 ∗ · · · ∗ n m HCFs, plus the additional

Figure 11.

The advantages and disadvantages of share, frequency and fusion approaches.

HORIZONTAL CLASS PARTITIONING

Figure 12.

173

Multiple derived horizontal fragment canditates.

cost of the merge procedure, but it can generate derived HCFs that meet query requirements (best-fit DHCP). This is because the algorithm has all the knowledge about the HCFs of the member classes, as well as that of the query. Note that we can use the cost model to select the good candidate class for DHCP. Let S, FQ, NF, and FU be all possible member classes obtained by share, frequency, number of HCFs, and fusion  candidates,   respectively. Let MC be the union of all possible candidates (i.e., MC = S FQ NF FU). For each candidate in MC, we apply our cost model to calculate the cost for executing the set of queries. And finally, we pick the candidate that gives the minimum cost. 4.2.

Multiple derived horizontal fragment candidates

Suppose a class C j in a class composition hierarchy has been horizontally partitioned using the primary algorithm. In this case, there is at least one path from root class (C1 ) to C j . Let P = C1 .A1 · · · A j be a path (see figure 12) with length β(β > 1). This path contains β classes, where each class can be derived horizontally partitioned based on the HCFs of the class C j . In this case, the question is: which class among these β classes should be derived horizontally partitioned?. For example, we assume that the class Project is a member class. We note that there is a path P = Employee.Dept.Proj between the class Employee and class Project. In P, there are two classes (namely Employee and Department) which can be derived horizontally partitioned based on the HCFs of the class Project. As we showed in the Subsection 3.2 related to the performance of the DHCP, we should select a class among β classes so as to improve the performance of processing a given set of queries on the OODB system. To do so, we derived horizontally partition a class Ci from β candidate classes based on the member class C j , after that we execute a set of queries on a class composition hierarchy taking into account the partitioning of the class Ci and the member class, and then compute its cost for processing the given set of queries. Finally, we pick the candidate that gives the minimum cost. We do this for all the β classes as shown in figure 13. 5.

Fragment localization and object migration

The primary or derived HCP generates a set of HCFs of a class. When a query is submitted, the query optimizer needs to identify the HCFs that contribute to the result of the query.

174

BELLATRECHE, KARLAPALEM AND SIMONET

Figure 13.

Algorithm for selecting the owner class to be derived horizontally fragmented.

Figure 14.

Object query processing methodology for unpartitioned class.

By loading into the main memory only those HCFs needed by the query, the optimizer can reduce the overall evaluation cost for processing queries. This task is done by the HCF localization. In order to perform HCF localization, we assume a query processing methodology (such as [29]) shown in figure 14 for unpartitioned class. Definition 5. Conjunctive Query consists of simple or component predicates connected with “∧” representing the logical AND. In this paper, we concentrate on conjunctive queries that are the most widely used and are also considered by the most previous researchers [15, 16]. Note that each HCF Fi is represented by a clause cli and each conjunctive query has a qualification clause S. The HCF localization uses a boolean function, called valid(cli , S) in order to identify the HCFs needed for executing a given query. This function has cli and S as inputs, and it is evaluated as follows [16]:

HORIZONTAL CLASS PARTITIONING

175

1. (cli ∧ S) is unsatisfiable, implying that the HCF Fi defined by the clause cli will not contribute to the query answer. No need to access this HCF to process the query. In this case, the result of valid(cli , S) is false. 2. cli implies S, implying that the HCF Fi will contribute to the query result. Then the result of valid(cli , S) is true. To evaluate the two above scenarios, we will be facing to the following classical problems, namely the satisfiability problem (SAT): 1) “Is cli ∧ S satisfiable ?” and 2) the implication problem (IMP): “Does cli implies S ?”. These two problems are central to database systems, where they have received fairly intensive studies [15, 16]. All studies showed that these problems are NP-hard [16]. But there exists efficient heuristics to solve the SAT and IMP problems [15, 16]. One of the heuristics is those developed by Guo et al. [16] where they considered the SAT problem (“Is S satisfiable ?”) for a conjunctive query S involving predicates of the form (X op value), where op ∈ {≤, }. They proposed an efficient algorithm to solve the SAT problem where the complexity of their heuristic is O(|S|), where |S| is the number of predicates of S. For the implication problem (Does S imply S  ?), they [16] developed an algorithm which uses the same steps of those for SAT problem. Finally, the authors [16] introduced a set of axioms to compute a finite set of inequalities (predicates) which is equivalent to S. These axioms generalize Ullman’s axioms [31]. The complexity of this algorithm is min(|S|2.376 +|S  |, |S|×|S  |), where |S  | is the number of predicates of S  . For more details refer to [16]. 5.1.

Routing algorithm

Now we have all tools to define an algorithm (called routing) in order to select the HCFs needed for executing a given query q. Let C be a class which that is horizontally partitioned into α HCFs {F1 , F2 , . . . , Fα }. By executing the algorithm in figure 15, we determine which HCF(s) will contribute to the result of the query q. Example 7. In order to illustrate the routing algorithm, we consider the following example. Let class Project be horizontally partitioned into five HCFs, where each one is represented by a clause. Let q be a query defined on the class Employee and consists of retrieving the begin for i: = 1 to α do % α represents the number of HCFs (1) if (valid (Ti , S)) then begin (2) routing [i]: = 1; end Figure 15.

Routing algorithm.

176

BELLATRECHE, KARLAPALEM AND SIMONET

Figure 16.

Object query processing methodology for horizontally partitioned class.

name of all employees who work on projects located at “CA” with cost( ) < $6500. The qualification clause of this query is: S: e.Dept.Proj.Location, = “CA” ∧ e.Dept.Proj.Cost( ) < $6500. Initially, all values of routing vector are equal 0. For each HCF of the class Project, we check the valid function as follows: valid(Project1 , S) = 0, valid(Project2 , S) = 1, valid(Project3 , S) = 0, valid(Project4 , S) = 1, valid(Project5 , S) = 0. The final vector is: (0 1 0 1 0) which means that query q will be executed only on Project2 and Project4 . HCF localization layer plays a very important role for query processing in partitioned OODBSs. Without it, a query has to access all the HCFs of a class for generating the result. Also the routing algorithm can be included in the methods and invoked at run time by applying the method transformation technique developed in [19]. Therefore, the query execution methodology in the OODBS that governs the processing and execution of queries gets transformed as shown in figure 16. This technique is also applicable for transforming methods that access horizontally partitioned classes. We refine a query processing methodology for the horizontally partitioned classes shown in figure 14, by adding one module called routing module, which indicates the HCFs that need to be accessed by query using the horizontal partitioning schema (contains all HCFs represented by their clauses) as shown in figure 16. When there is a query q, the routing module executes the algorithm described in figure 15. Complexity of the routing algorithm. The main statement in the routing algorithm in figure 15 is the valid(cli , S) function. As we need to check whether valid is “true” or “false”, we use the same algorithm for SAT and IMP problems developed in [16] by considering (Is cli ∧ S satisfiable?) and (Does cli imply S ?) problems, respectively. The overall complexity of the routing algorithm is: min(|cli |2.376 + |S|, |cli | × |S|).

5.2.

Object migration

In this section, we consider the effect of update on object instances on the HCF2 and we present the problem of object migration. When an attribute value is modified by update query, some objects may migrate from one HCF to another.

HORIZONTAL CLASS PARTITIONING

177

Algorithm. Let for an update operation, U be the set of attributes whose value is modified. Each element A ∈ U is called an update attribute. This attribute has two values: old value and new value. Example 8. Let qu be an update query and which consists in extending by 1 year the duration of a project with PId is 1234. The update attribute of qu is Duration. The steps of object migration algorithm are: Input: An update query, and U a set of updated attributes. For each update attribute A ∈ U do 1. Find out in which HCF the object o to be modified is stored. This step is realized by the HCF localization. Let Fi be the HCF where this object is stored. 2. Read the old value of this object. 3. Perform the required modification of this object that gives the modified object o . 4. Determine the HCF F j corresponding to the object o . This step is done by the HCF localization (see Section 5). 5. Verify the migration possibility, i.e., if Fi = F j , then “No Migration”, otherwise, migrate the object o to the HCF F j . Example 9. Let qu be the update query defined above. We assume that the project pr with PId 1234 is in Project1 (see Section 2) and it is defined as I : < 1234, “DB”, 4, “CA”, 2 >. The old value of the update attribute Duration is 4. After the update query the instance I will be as follow: I  : < 1234, “DB”, 5, “CA”, 2 >, and then the new value of the update attribute is 5 that does not satisfy the predicate corresponding to Duration attribute in the Project1 , i.e., Duration ≤ 4, but it satisfies the predicate in the HCF Project3 , i.e., 5 ≤ Duration ≤ 6. Therefore, the project pr is migrated to the HCF Project3 . The object migration algorithm has the same complexity as HCF localization, because the Steps 1, 4, and 5 are done by using HCF localization. 6.

Conclusion & future work

Horizontal partitioning is a well studied problem in relational databases that has shown its utility in reducing irrelevant instance accesses for processing a given set of queries. This problem is just being studied in the context of OODBs. The early work in this area proposed a high complexity solution to the problem using the existing solutions on relational databases. But these solutions do not apply the most efficient algorithm available for relational databases, and further, disregards the completeness of the solution provided. In contrast, in this paper, we modify the best algorithm available for relational databases by making suitable changes to handle the complexity of object oriented data model to generate horizontal partitioning. Further, we studied the problem of derived horizontal partitioning, and proposed solutions for overlapping derived HCFs, and addressed the pertinent issues of multiple primary HCF candidates, and multiple derived HCF candidates.

178

BELLATRECHE, KARLAPALEM AND SIMONET

Finally, we addressed the important problem of fragment localization and object migration. Thus, at logical database level we have provided a complete solution, that presents a set of algorithms and support for horizontally partitioning an OODB schema accessed by a set of queries. We have also worked on a cost model to evaluate effectiveness of HCP, and are currently designing algorithms for query processing and optimization on horizontally partitioned object databases. Other problems such as adaptive HCP by taking into consideration changes in queries are yet to be addressed. Notes 1. Each query qh of Q j is executed with a certain frequency f j . 2. These HCFs are obtained either by primary algorithm or derived algorithm

References 1. M. Atkinson, F. Bancilhon, D.J. DeWitt, K.R. Dettrich, D. Maier, and S. Zdonik, “The object database system manifesto,” in Proceeding of the First International Conference on Deductive, Object-Oriented Databases, December 1989, pp. 223–240. 2. J. Banerjee, W. Kim, and K.C. Kim, “Queries in object oriented databases,” in Proceedings of the IEEE Data Engineering Conference (ICDE’88), February 1988, pp. 31–38. 3. J. Banerjee, W. Kim, H.J. Kim, and H.F. Korth, “Semantics and implementation of schema evolution in object-oriented databases,” in Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’87), May 1987, pp. 311–322. 4. L. Bellatreche, K. Karlapalem, and G.B. Basak, “Query-driven horizontal class partitioning in object-oriented databases,” in Proceedings of the 9th International Conference on Database and Expert Systems Applications (DEXA’98), August 1998, pp. 692–701. Lecture Notes in Computer Science 1460. 5. L. Bellatreche, K. Karlapalem, and G.K. Basak, “Horizontal class partitioning for queries in object oriented databases,” Technical Report HKUST-CS98-6, 1998. 6. L. Bellatreche, K. Karlapalem, and Q. Li, “Derived horizontal class partitioning in OODBSs: Design strategy, analytical model and evaluation,” in 17th International Conference on the Entity Relationship Approach (ER’98), November 1998, pp. 465–479. Lecture Notes in Computer Science 1507. 7. L. Bellatreche, K. Karlapalem, and A. Simonet, “Horizontal class partitioning in object-oriented databases,” in proceedings of the 8th International Conference on Database and Expert Systems Applications (DEXA’97), September 1997, pp. 58–67. Lecture Notes in Computer Science 1308. 8. E. Bertino, “On modeling cost functions for object-oriented databases,” IEEE Transactions on Knowledge and Data Engineering, vol. 9, no. 3, pp. 500–508, May/June 1997. 9. E. Bertino and C. Guglielmina, “Path-index: An approach to the efficient execution of object-oriented queries,” Data & Knowledge Engineering, vol. 10, pp. 1–27, 1993. 10. E. Bertino, M. Negri, G. Pelagatti, and L. Sbattella, “Object-oriented query languages: The notion and the issues,” IEEE Transactions on Knowledge and Data Engineering, vol. 4, no. 3, pp. 223–237, 1992. 11. S. Ceri, M. Negri, and G. Pelagatti, “Horizontal data partitioning in database design,” in Proceedings of the ACM SIGMOD International Conference on Management of Data, 1982, pp. 128–136. 12. W.S. Cho, C.M. Park, K.Y. Whang, and S.H. So, “A new method for estimating the number of objects satisfying an object-oriented query involving partial participation of classes,” Information Systems, vol. 21, no. 3, pp. 253–267, 1996. 13. C.I. Ezeife and K. Barker, “A comprehensive approach to horizontal class fragmentation in distributed object based system,” International Journal of Distributed and Parallel Databases, vol. 3, no. 3, pp. 247–272, 1995. 14. G. Gardarin, J.-R. Gruser, and Z.-H. Tang, “A cost-based selection of path expression processing algorithms in object-oriented databases,” in Proceedings of the 22th International Conference on Very Large Databases (VLDB’96), 1996, pp. 390–401.

HORIZONTAL CLASS PARTITIONING

179

15. S. Guo, S. Wei, and M.A. Weiss, “On satisfiability, equivalence, and implication problems involving conjunctive queries in database systems,” IEEE Transactions on Knowledge and Data Engineering, vol. 8, no. 4, pp. 604–612, August 1996. 16. S. Guo, S. Wei, and M.A. Weiss, “Solving satisfiability and implication problems in database systems,” ACM Transactions on Database Systems, vol. 21, no. 2, pp. 270–293, 1996. 17. P.J. Haas and A.N. Swami, “Sampling-based selectivity estimation for joins using augmented frequent value statistics,” in Proceedings of International Conference on Data Engineering (ICDE’95), 1995, pp. 522–531. 18. K. Karlapalem and Q. Li, “Partitioning schemes for object oriented databases,” in Proceeding of the Fifth International Workshop on Research Issues in Data Engineering-Distributed Object Management, RIDEDOM’95, March 1995, pp. 42–49. 19. K. Karlapalem, Q. Li, and S. Vieweg, “Method induced partitioning schemes in object-oriented databases,” in 16th International Conference on Distributed Computing System (ICDCS’96), Hong Kong, May 1996, pp. 377–384. 20. K. Karlapalem, S.B. Navathe, and M. Ammar, “Optimal redesign policies to support dynamic processing of applications on a distributed database system,” Information Systems, vol. 21, no. 4, pp. 353–367, 1996. 21. A. Kemper and G. Maerkotte, “Access support in object bases,” in Proceeding of the International Conference on Management of Data, ACM-SIGMOD’90, 1990, pp. 364–374. 22. A. Kemper and G. Moerkotte, “Physical object management,” in Modern Database Systems: The Object Model, Interoperability, and Beyond, Won Kim (Eds.), Addison Wesley/ACM Press, 1995, pp. 175–202. 23. W. Kim, “A model of queries for object-oriented databases,” in Proceedings of the 15th International Conference on Very Large Databases (VLDB’89), August 1989, pp. 423–432. 24. W. Kim, Introduction to Object-Oriented Databases, MIT Press, 1990. 25. Y. Ling and W. Sun, “An evaluation of sampling-based size estimation methods for selections in database systems,” in Proceedings of International Conference on Data Engineering, ICDE’95, 1995, pp. 532–539. 26. D. Maier, J. Stein, A. Otis, and A. Purdy, “Development of an object-oriented DBMS,” in the Proceedings of the Conference on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA’86), vol. 21, no. 11, November 1986, pp. 472–482. 27. S.B. Navathe, K. Karlapalem, and M. Ra, “A mixed partitioning methodology for distributed database design,” Journal of Computer and Software Engineering, vol. 3, no. 4, pp. 395–426, 1995. 28. S.B. Navathe and M. Ra, “Vertical partitioning for database design: A graphical algorithm,” in Proceeding of the International Conference on Management of Data, ACM-SIGMOD’89, 1989, pp. 440–450. ¨ 29. M.T. Ozsu and J. Blakeley, “Query optimization and processing in object-oriented database systems,” in Modern Database Management—Object-Oriented and Multidatabase Technologies, Won Kim (Eds), AddisonWesley/ACM Press, 1994, pp. 146–174. ¨ 30. M.T. Ozsu, and P. Valduriez, Principles of Distributed Database Systems, Prentice Hall, 1991. 31. J.D. Ullman, Principles of Database and Knowledge-Base Systems, vol. II, Computers Science Press, 1989.

Suggest Documents