Parallel Processing Evaluation of Path Expressions - CiteSeerX

Parallel Processing Evaluation of Path Expressions Flavio Tavares [email protected]

André Victor [email protected]

Marta Mattoso [email protected]

Department of Computer Science - COPPE/UFRJ PO Box: 68511, Rio de Janeiro, RJ, Brazil, Zip Code: 21945-970 Telephone: +55+21+590-2552, Fax: +55+21+290-6626 Abstract Parallel and distributed processing are alternatives to optimize queries in Database Systems. In this work different alternatives for parallel query processing were implemented and evaluated. This evaluation aims at analyzing the potential for parallel processing of these query strategies and providing heuristics to query optimizers. The experiments were made with an IBM SP/2 parallel machine. Performance evaluation used the datasets and queries specified by the OO7 benchmark. The results indicated the best query execution strategy for different path expressions analyzed. The tests also showed a significant parallel potential for the backward, also known as pointer-based join, execution strategy. Nevertheless, the forward execution strategy, also known as naive pointer chasing, has proven its effectiveness when objects from a small collection point to objects of a large collection in the path expression, outperforming the backward algorithm both in parallel and serial executions.

1. Introduction Objects typically have complex structures, basically because the attribute type can be a reference to other objects. Path expressions, also known as reference chains, paths, complex predicates or implicit joins, are responsible for accessing complex objects. Path expression processing optimization is a central and difficult issue in object query languages. Path expressions can be found in OQL (Object Query Language, from the ODMG [3] standard), SQL:1999 (SQL extension that includes object references in queries) and XML query languages. Shekita and Carey [14] have shown limitations in using algorithms from the relational model for path expressions traversal in the OO (Object Oriented) model. In fact, most DBMS products that support path expressions implement specific algorithms exploring references instead of values. One of the problems in evaluating and comparing existing works on path expression processing is the lack of a universally accepted object algebra. Thus, algorithm proposals for operators representing path expressions vary in terminologies, processing strategies and object representation models, making it difficult to compare their results. However, there is a consensus on object algebra operators used for rewriting path expressions, since most related works adopt the map and join operators [13, 15]. Therefore, in this work we focus on these operators in order to evaluate different proposed algorithms. Map materializes object references in relationships within path expressions. The join operator, however, processes relationships by performing a join operation analogous to the traditional relational join operation. The main difference lies in the use of references (pointers through object identifiers – OIDs) instead of values (keys). Basically, map works on individual instances while join works on collections of references. Most sequential algorithms [1, 8, 14] implement the map and join operators. In this work, we selected two representative algorithms from the literature and evaluate their behavior. To represent the map operator, we implement the algorithm called individual reference (or naive pointer based), and to represent the join operator, we implement the hybrid-hash join algorithm proposed by Shekita and Carey [14]. Even though these operators can be combined for rewriting

a path expression, we focus on analyzing the potential of each one separately to reduce the complexity of path expression evaluation. Shekita and Carey [14] compare different join-based algorithms, while Keller et al. [8] analyze pointer-based algorithms. More recently, Braumandl et al. [1] compare algorithms involving both individual references and joins using OIDs presenting experimental analysis complemented by simulations. However, they ignored a crucial aspect for choosing the best strategy, which is the cardinality variation of the operand collections within a path expression and the effect of selectivity factors. Therefore, the results obtained may not apply when high variations of cardinalities occur, thus compromising the potential of both strategies. Our work presents an evaluation of path expression processing algorithms using parallel techniques. Although the parallel algorithms shown here can be found in the literature [5, 16], previous analyses present the same limitations found in the sequential evaluations. Also, these works do not evaluate individual reference-based algorithms. According to DeWitt et al. [5], traversing individual references across distributed object bases may cause heavy remote access and parallel processing may become prohibitive. However, object grouping, fragmentation and communication strategies between processors may be used to overcome these difficulties. Our evaluation aims at analyzing the potential for parallel processing of query strategies and providing heuristics to query optimizers. The experiments were made with an IBM SP/2 parallel machine. Performance evaluation used the datasets and queries specified by the OO7 benchmark [2]. The results indicate the best query execution strategy for different path expressions analyzed. The tests also showed a significant parallel potential for the pointer-based join, backward execution strategy. Nevertheless, the forward execution strategy, using the individual references (or naive pointer chasing), has proven its effectiveness when objects from a small collection point to objects of a large collection in the path expression, outperforming the pointer-based join algorithm both in parallel and serial executions. Results obtained clearly show the importance of having both classes of algorithms in the DBMS. The remaining of this paper is organized as follows. Section 2 presents a formal definition for path expressions and the main algorithms proposed in the literature. Section 3 describes our experiments and parallelism solutions adopted in ParGOA. A case study based on OO7 benchmark is presented in section 4. Results obtained are deeply discussed in section 5, where a concluding analysis suggests heuristics for choosing the best algorithm. Finally, section 6 draws some conclusions, final considerations and future works. 2. Path Expressions Processing Let {C1, C2, ..., Cn} be a set of related collections. Each collection Ci contains a set of objects of type Ti. From the initial collection C1 to the collection Cn-1, objects are linked through attributes: Ai is an attribute of type Ti whose values are references to either a single object or a collection of objects of type Ti+1. A qualified path expression, or path expression, is an expression of the form: X1[P1].A1.X2[P2].A2.X3[P3]...An-1.Xn[Pn]

(1)

where Xi is a variable representing an object from collection Ci, Ai is a relationship attribute and Pi is a predicate that qualifies Ci objects. Pi is optional and can be empty, which means that all objects from collection Ci are valid for the query result. The length of the path expression is n (number of collections). When at least one of the attributes in the path expression is an object collection, the path expression is called multi-valued. Analogously, when all attributes are

references to other objects and not collections, the path expression is called mono-valued. Additionally, Ci collection is called root collection, while Cn-1 is called target collection. An alternative to improve path expression processing is through parallelism. In relational systems, parallel query techniques are quite straightforward because only uniform sets of data are processed through pre-defined relational algebra operations. In the OO model, however, relationships through object pointers may cause reference to objects placed in different nodes. Therefore, parallel queries involving path expressions, i.e. pointer navigation, may increase communication between nodes and random disk access thus decreasing parallelism due to unload balance. It is important to evaluate reference-based and join-based strategies with path expression parallel processing, since the behavior of sequential algorithms may not be the same in a parallel environment. Previous works from the literature examine the behavior of parallel path expression processing. DeWitt et al. [5] extended the work from Shekita and Carey [14] by evaluating the behavior of these algorithms through simulations. In addition, these algorithms were modified to address one-to-many relationships (collection type attribute). Furthermore, two variations of the hybrid-hash algorithm and a new one called probe-child were proposed. These algorithms are more flexible than Shekita and Carey’s algorithms in a sense that they do not require a class extension for attribute type collection. The authors discarded the reference-based analysis as well as shown in [14], even though some conditions are favorable to this strategy. Another limitation found in [5] was the lack of cardinality and selectivity variation, hiding specific potential of the strategies. 3. ParGOA – the experimental environment Our experimental results were conducted on the ParGOA, a parallel object management system developed at COPPE/UFRJ. ParGOA is a parallel extension of GOA++ [10] for path expressions parallel processing. GOA++ is a system that provides persistence and object collection management services, such as query processing, disk grouping of objects, cache management and indexing techniques. Queries are analyzed and classified in two categories: (i) mono-valued path expressions and (ii) multi-valued path expressions. Both categories are implemented with the two approaches specified in this work: reference-based algorithms and join-based algorithms. Queries of category (i) intuitionally suggest the reference-based algorithm execution, while queries of category (ii) suggest join-based. However, results may show the opposite, when there is cardinality and selectivity variations in the queries. ParGOA uses a “master-slave” parallel processing model, where a central process (master) controls and synchronizes the other process execution (slaves), which actually execute queries. Also, for each node that has an object base fragment from a fragmented base, there are two processes being executed: one executes queries and the other is responsible for retrieving object references requested by other nodes in the parallel machine. In this case, the remote node requests the page where the object is stored.

3.1. ParGOA Architecture

ParGOA architecture (Figure 1) was proposed by Tavares [17]. Its main components are the Task Manager and the Local Server. Each system node has a Local Server that is responsible for parallel processing of queries. Three distinct processes, which work simultaneously in each local base, compose a Local Server: Communication Manager, Local Query Processor and Volume Manager. PARGOA

Task Manager Fragmentation Manager Query Manager

Local Server Comunication Manager

Local Server Comunication Manager

Volume Manager Query Processor


Fragment 1

Fragment n

Local Server Comunication Manager ...


Fragment n

Figure 1 – ParGOA architecture

In each Local Server, the Volume Manager is responsible for controlling a Page Server List. This list has length equal n, where n is the number of Local Servers in the system. In addition, one Local Page Server and n-1 Remote Page Servers compose this list. Because of this implementation, the object access, local or remote, is done uniformly in ParGOA and follows the same rule of its sequential version. The Object Manager requests an object through its OID to the Volume Manager. Therefore, if the object is stored locally, the Local Page Server returns its associated page and object access will work as a GOA++ sequential version. Whereas, if the object is stored in another node, the returned page would be a Remote Page Server page related to the node where the object is stored. If the Remote Page Server does not have an object copy stored in its cache, it would request this page to the Communication Manager. Using this cache approach, we intend to decrease communication between nodes. The results present a communication reduction evidencing the efficiency of this cache mechanism. A remote node uses objects stored in other nodes through an object copy stored in a remote cache. However, when shared pages are updated in this communication model, it is mandatory to have an elaborated mechanism for cache coherency. Ostoff et al. [11] present an efficient proposal for this problem in GOA. Local Query Processor is the process that actually executes a query. It uses two path expressions processing strategies: reference-based and join-based. It receives a query from Query Manager, processes it and returns an OID list of objects that satisfy predicates.

Task Manager is responsible for receiving queries expressed in OQL from clients and sending them to corresponding Local Servers. It has two components: Fragmentation Manager and Query Manager. Fragmentation Manager locates the nodes involved in the query. It uses a fragmentation table that stores information about the fragmentation scheme. Query Manager distributes local queries to the Local Servers indicated by the Fragmentation Manager. When local Query Processors finish their queries, they send their local OID list to the Query Manager, which creates a unified OID list, containing all select objects processed in all distributed bases. 3.2. Path Expression Strategies Two path expression strategies were evaluated: the reference based and join-based. SELECT a FROM a in AtomicParts WHERE a.part_of.document.date > 10/01/1990 Figure 2 – Query expressed in OQL with mono-valued path expression

In the reference-based strategy, the path expression, in Figure 2, AtomicPart→ CompositePart (through part_of pointer)→Document (through document pointer) is navigated. When an object from Document class is processed, the query processor evaluated the predicate, that is, if the date attribute is greater than 10/01/1990. In the join-based strategy, query predicates are evaluated against the extension of the target class (Documents extension in Figure 2). After predicate evaluation (document.date < 10/01/1990) in each local fragment from Documents extension, each Local Server has an OID list of Document class objects that satisfy the predicate. Each server will send a copy of its list to all other servers. After merging its own list with the ones sent by other servers, each server joins the new list with the next extension in the path expression of the local fragment (CompositeParts extension in the example). Local join is executed by hybrid-hash algorithm as specified in [14]. The next joined resulting list is sent to all other servers and the same process is repeated for all collections in the path (in this case, AtomicParts extension). After all the joins of all class extensions in the path, each server has one resulting local list. These lists are sent to the Query Manager, which merges the local lists and returns a complete query result to the client. 3.3. Computational Environment Our experimental environment consists of a completely dedicated IBMSP2 with 32 THIN processors, each one containing 256 Mbytes of RAM and 8 WIDE processors type, each one containing 1Gbyte of RAM. ParGOA, however, needs to store local fragments in local disks. Only 12 out of the 40 SP2 processors have local disks with enough capacity for storing local bases (we used medium length configuration of OO7 benchmark). Consequently, the maximum configuration used by ParGOA had 12 THIN processors. In each test, processors were used exclusively. The nodes are connected by a 40 MBytes/s Omega switch. All our systems communicate using the PVM - Parallel Virtual Machine - library for synchronization and message passing between processors.

4. Case Study The OO7 benchmark [2] has been, in recent years, the standard tool for performance evaluation in ODBMS [6, 7, 12, 16]. Its aims are to test performance in query processing using object collections with varying cardinalities. In addition, operations such as cache and disk access, object inclusion, exclusion and updating may be evaluated. Some modifications were done in OO7 class scheme (Figure 3): Module, ComplexAssembly and Manual classes were excluded for simplification, even though their absence does not impact the results. OO7 medium configuration is presented in Table 1. DesignObj Id Type BuidDate

Table 1 – OO7 medium configuration

N

M

ComponentsShared

BaseAssembly 1

CompositePart CompositPart

N

M

ComponentsPriv 1

RootPart

Parts

1

AtomicPart X Y

Documentation

M 1

1 M To

Connection 1

Collection CompositeParts Documents AtomicParts Connections

Cardinality 500 500 100000 300000

Document Title Date

Type Length

M From 1

Figure 3 – OO7 class diagram adapted

We used static horizontal fragmentation of objects, because of the distributed memory architecture. The date attribute provides uniform distribution among the nodes and derived horizontal fragmentation in class Document improves local access when the CompositePartsDocument relationship is used. Objects are locally grouped by class extension. In addition, the number of the fragments created was equal to the number of processors of the configuration, as suggested in [9]. Therefore, the fragmentation strategy is as following: • CompositeParts ⇒ primary fragmentation by value range (date attribute). • Documents ⇒ derived fragmentation based in CompositeParts. • AtomicParts ⇒ primary fragmentation by value range (date attribute). 5. Result Analysis The results correspond to a representative sub-set of the results obtained in [17]. Five fragmented object base configurations were used on the tests – one different fragmentation schema for each configuration (1, 2, 4, 8 and 12 processors at a time). Results correspond to queries shown in Tables 2 and 3. Response time correspond to “wall clock” time [4], i.e., user waiting time - time system is measured in two moments: before the Task Manager requests the query execution and after it receives the query results. Therefore, response time corresponds to all query phases execution, beginning at the selection of which nodes will participate on the execution until each Local Server returns the partial results that are merged to compose the final result by the Task Manager. In addition, I/O and communication times are also considered, since they are computed within these phases.

Table 2 – OQL queries involving mono-valued path expressions Mono-valued path expressions

Selectivity

A) select a from a in AtomicParts where a.part_of.document.date < 10/01/1990

90%

B) select a from a in AtomicParts where a.part_of.document.date < 10/01/1910

10%

Table 3 – OQL queries involving multi-valued path expressions Multi-valued path expressions

Selectivity

C) select c from c in CompositeParts, a in c.parts where a.x < 100000

10%

D) select c from c in CompositeParts, a in c.parts where a.x < 900000

90%

Response time corresponds to “cold” execution (first query execution) and “hot” execution (subsequent executions) averages. Each query was executed 20 times to obtain each average. Tables containing remote accesses and cache behavior are also presented for complementing analysis and interpreting execution times. 5.1. Mono-Valued Path Expressions 5.1.1. Reference-Based It is important to note that queries of this group involve remote processor access. Even though CompositeParts and AtomicParts extensions have been horizontally fragmented by value range over the date attribute, there is no relationship between date values in these two classes. Consequently, compositeParts objects may be stored in different nodes from its composed atomicParts. Figures 4 and 5 present the speedup and number of processors corresponding to execution of queries A and B. These speedups are shown in contrast with linear speedup. The corresponding processing times are shown in Tables 4 and 5. 14

Linear Speedup

12 10

Query A-J

8 6

14 12 10

Linear Speedup

8

Query A-J

6

4

Query B-J

2

Query B-J

4 2

0

0 1

2

4

8

12

1

Figure 4 – Cold execution speedup

2

4

8

12

Figure 5 – Hot execution speedup

Table 4 – Cold execution averages Query Query A Query B

1 99.531 95.431

# Processors 2 4 53.361 27.422 50.077 26.007

8 13.519 13.496

12 9.994 9.395

Table 5 – Hot execution averages Query Query A Query B

1 99.344 94.697

2 53.778 50.837

# Processors 4 27.729 25.610

8 13.96 13.91

12 9.32300 8.7652

Despite remote access of ParGOA cache (Table 6), query A and B speedups were always near optimal speedup because local and remote cache access decreases. As more processors are added in the system, each fragment collection extent gets smaller, so main memory becomes large enough for storing fragments. It is important to notice that remote access represented by cache miss occurs only during cold execution. The efficiency of the hot execution corresponds to the object locality in the pages as a consequence of the object grouping approach. In addition, page copies maintained by Remote Page Servers in the remote cache has also contributed to the cache hit ratio avoiding communication overhead. Table 6 – ParGOA’s cache behavior in each processor of queries A and B Procs

1 2 4 8 12 Procs

1 2 4 8 12

Cold Execution of A Local Accesses Remote Accesses Hit Miss Hit Miss 10660552 3023 4358906 1839 42138 24 1957787 833 20233 16 1082207 465 14953 13 505249 222 1616 6

Hot Execution of A Local Accesses Remote Accesses Hit Miss Hit Miss 10660544 3017 4358886 1835 42162 0 2200859 931 20555 0 1148641 491 10233 0 505129 218 2186 0

Cold Execution of B Local Accesses Remote Accesses Hit Miss Hit Miss 10660552 3023 4358906 1839 77030 13 1957787 833 18814 16 1082827 465 14953 11 505249 222 1977 7

Hot Execution of B Local Accesses Remote Accesses Hit Miss Hit Miss 10660544 3017 4367876 1839 42162 0 2175179 920 19435 0 1082669 461 9049 0 505129 218 1984 0

The cache behavior of ParGOA (Table 6) is almost identical in both queries, even though the results are different (cardinalities). Therefore, experiments show that the reference-based strategy is not affected by the size (or selectivity factor) of the target collection. 5.1.2. Join-Based Figures 6 and 7 present the speedup and number of processors corresponding to execution of queries A and B using the join-based path expression processing strategy. These speedups are shown in contrast with linear speedup. The corresponding processing times are shown in Tables 7 and 8. In Figures 6 and 7, curves for queries A and B were replaced by A-J and B-J, respectively, to evidence the join-based strategy. 14

14 12 10 8 6 4 2

Linear Speedup

12

Query AJ

8

Query BJ

4

0

Linear Speedup

10

Query AJ

6

Query BJ

2 0

1

2

4

8

12


1

2

4

8

12


Table 7 – Cold execution averages Query Query A-J Query B-J

1 78.199 72.978

2 39.079 37.162

# Processors 4 19.683 18.626

8 10.120 9.905

12 8.372 7.862

8 10.36 10.21

12 8.362 7.771

Table 8 – Hot execution averages Query Query A-J Query B-J

1 77.585 74.193

2 39.079 37.063

# Processors 4 19.686 18.626

We can observe that join-based execution times are smaller than reference based strategy in all configurations. In addition, join-based strategy presents a better performance improvement as more processors are added to system (up to 4 processors). This fact is due to the corresponding communication overhead associated with the two strategies. With join-based, communication between nodes occur through OID lists exchange, instead of the objects themselves as happens with the map operator from the reference based strategy. Therefore, there is a lower data traffic between system nodes. Table 10 – Join-based strategy improvements against reference-based (“hot” times) Improvements Query A Query B

1 28% 27%

2 37% 37%

4 40% 37%

8 34% 36%

12 11% 13%

120 100

Query A

80

Query B

60

Query A-J

40

Query B-J

20 0 1

2

4

8

12

Figure 8 – Comparing “hot” times of both strategies

Figure 8 contrast execution times for both strategies. Curves in Figure 8 apparently show an approximation of the execution times for both strategies particularly with 8 and 12 processors. However, Table 10 shows a relevant difference in the performance of the two strategies (specially up to 4 nodes). This difference diminishes as more nodes are added to system since the data sets become smaller. However it should be noted that from 1 processor to 4 the difference increases evidencing the potential of the hybrid-hash algorithm. These results confirm the model proposed in [14]. In this way, the intuitive idea that mono-valued path expressions should be processed by pointer navigation is invalidated. Table 11 (ParGOA’s cache behavior) shows clearly the lack of remote accesses of joinbased execution. In this kind of execution, remote objects are not requested and communication between nodes is limited to OIDs list.

Table 11 – ParGOA’s cache behavior of queries A-J and B-J Procs

1 2 4 8 12 Procs

1 2 4 8 12

Cold Execution of A-J Local Accesses Remote Accesses Hit Miss Hit Miss 10664834 3025 5273252 1513 0 0 2185661 927 0 0 1476819 619 0 0 747528 321 0 0

Hot Execution of A-J Local Accesses Remote Accesses Hit Miss Hit Miss 10664824 3021 5270749 0 0 0 2202154 0 0 0 1477438 0 0 0 720817 0 0 0

Cold Execution of B-J Local Accesses Remote Accesses Hit Miss Hit Miss 10664171 3025 5320674 1517 0 0 2185552 927 0 0 1448148 619 0 0 721402 310 0 0

Hot Execution of B-J Local Accesses Remote Accesses Hit Miss Hit Miss 10664161 3021 5270556 0 0 0 2202055 0 0 0 1448756 0 0 0 720799 0 0 0

We believe that for configurations with more than 8 processors there is no speedup because of the smaller size of each base fragment, which fits totally in the memory of each processor. We may estimate that when a larger object base is used in this test, improvements can be more evidenced for a configuration with more than 12 processors. 5.2. Multi-Valued Path Expressions 5.2.1. Reference-Based This section presents query results for multi-valued path expressions. In queries C and D, parts attribute of CompositePart class is a collection of objects from AtomicPart class. The two queries are identical, except for selectivity factors. This factor has a great influence in choosing the most adequate strategy.

Query C returns 10% of objects from AtomicParts extension while query D returns 90% of objects. Figures 9 and 10 show graphs with the speedup of response time and number of processors for C and D queries execution, in contrast with linear speedup, considered the ideal speedup. The corresponding response times can be found in Tables 12 and 13. 14 10

Linear Speedup

8

Query C

12

6

14 10

Linear Speedup

8

Query C

12

6

4

Query D

2

4

Query D

2

0

0 1

2

4

8

12


1

2

4

8

12


Almost linear speedup was obtained in the path expression execution for every system configuration in the “hot” execution. Decreasing local and remote accesses observed during query execution resulted this excellent performance.

Table 12 – Cold execution averages Query Query C Query D

1 67.206 66.378

2 34.29 34.02

# Processors 4 17.544 19.425

8 12.21 9.156

12 8.307 7.679

8 9.8418 9.7958

12 7.221 7.234

Table 13 – Hot execution averages Query Query C Query D

1 66.012 65.208

# Processors 2 4 32.899 18.9258 37.630 18.8557

Tables 12 and 13 present slight differences between execution times of these two queries. Once again, the reference-based algorithm for multi-valued path expression is not affected by the selectivity factor of the target collection. Table 14 – ParGOA’s cache behavior of queries C and D Procs

1 2 4 8 12 Procs

1 2 4 8 12

Cold Execution of C Local Accesses Remote Accesses Hit Miss Hit Miss 10114735 1018 2602067 396 2418803 121 607568 177 439138 24 199882 100 282545 14 90448 48 53524 4

Hot Execution of C Local Accesses Remote Accesses Hit Miss Hit Miss 10115738 0 2602296 0 2556626 0 548721 0 439166 0 199918 0 282563 0 87622 0 53528 0

Cold Execution of D Local Accesses Remote Accesses Hit Miss Hit Miss 10114064 1018 2599828 396 2556498 128 511227 168 492861 32 168412 95 282545 14 90448 48 35218 3

Hot Execution of D Local Accesses Remote Accesses Hit Miss Hit Miss 10115068 0 2556626 0 2418924 0 511351 0 525329 0 168443 0 219683 0 77059 0 31440 0

Analyzing Table 14, we can observe that there is no cache miss in ParGOA, either in local accesses or on remote accesses. The lack of cache miss shows that processing of these queries to benefit itself objects grouping in pages approach, in this case AtomicParts extension, the greater extension involved in these queries. 5.2.2. Join-Based Figures 11 and 12 show graphs that present the speedup of processing time and number of processors for C and D queries execution, in contrast with linear speedup, considered the ideal speedup. The corresponding response times can be found in Tables 15 and 16. The selectivity factor of the predicates is the only difference between queries C and D. While in first query (AtomicParts extension is traversed and the x attribute is compared to 10000) 10000 compositeParts objects are selected, 90000 objects are selected in query D. Therefore, in the second phase of query C (join) in the sequential version, a join of 500 x 10000 objects (collection join involving objects from CompositePart and AtomicPart classes through parts attribute from CompositePart class) is executed, while in query D, a join is executed over 500 x 900 objects. It is important to note a behavior difference between execution of queries C and D for the join-based strategy. While in the reference-based both queries present similar execution times, in

the join-based query D is slower. This fact shows clearly that the reference-based and join-based algorithms are differently influenced by selectivity factors. While reference-based strategy is not affected by the selectivity factor of the target collection, because the object traversal is done over root collection (same number of the operations in both situations), in the join-based strategy the selectivity reduction (root, target or in any other collection) influences the performance, because there can be less objects participating in the join and consequently the performance can improve. 16 14 12 10 8 6 4 2 0

Linear Speedup Query C-J Query D-J

1

2

4

8

12

16 14 12 10 8 6 4 2 0

Linear Speedup Query C-J Query D-J

1


2

4

8

12


In parallel versions, as more nodes are added to system, the number of base fragments grows. Therefore, cardinality of object collections involved in joins decreases as well as difference of execution times between the two queries. Table 15 – Cold execution averages Query Query C-J Query D-J

1 96.524 167.944

2 44.296 50.64

# Processors 4 23.818 26.145

8 12.267 15.708

12 9.511 11.771

8 12.411 15.940

12 8.631 12.007

Table 16 – Hot execution averages Query Query C-J Query D-J

1 96.445 162.611

# Processors 2 4 46.300 23.974 50.1349 27.043

The join phase is also responsible for the worst performance of join-based strategy in contrast to reference-based. However, even though initially the join based performance was worse than reference-based (mainly in the sequential version), as more nodes are added to the system, the execution time decreases more intensively in the join-based strategy than referencebased. Therefore, while join-based strategy would be discarded for query C in sequential execution, as can be seen in the response time for 1 processor, it may be an interesting alternative when parallel processing is available. However, for query D, the join-based strategy is not the best choice for execution, neither in parallel nor in a sequential execution. Although curves in Figure 13 present approximate execution times for both strategies, Tables 17 and 18 show that, while in query C response times for both strategies converge, the join-based strategy still has the worst performance in query D. Therefore, for multi-valued path expressions, join-based strategy is strongly affected by selectivity factor.

Table 18 – Reference-based strategy improvements against join-based (“hot” times) Improvements Query C Query D

1 46% 149%

2 41% 33%

4 27% 43%

8 26% 63%

12 19% 66%

120 100

Query A

80

Query B

60

Query A-J

40

Query B-J

20 0 1

2

4

8

12

Figure 13 – Comparing “hot” times of both strategies

These results are important because they show that reference-based outperforms joinbased in some well-defined situations. However, Shekita and Carey [14] as well as DeWitt et al. [5] ignored the reference-based approach in their analyses. In addition, the authors in [14] did not explore multi-valued path expressions. 5.3. Heuristics It was clearly shown that the cardinality of the extensions involved and the selectivity factor influences in the selection of the best strategy for path expression processing. Let RC Î TC be a path expression of length two (binary relationship), where RC (Root Class) is the first class extension of the path and TC (Target Class) is the class extension that has objects related to objects from RC and that predicates are evaluated upon. An algorithm that finds the best alternative for query execution of RC Î TC, based on the experimental results presented in this work, is shown as follows: ChoosingBestStrategy( RC, TC ) { case Card(RC)=HIGH and Card(TC) = LOW do choose JOIN; case Card(RC)=HIGH and Card(TC) =HIGH do if SelFactor(TC) = LOW then choose REFERENCE; else choose JOIN;

case Card(RC) = LOW and Card(TC) = HIGH do if SelFactor(TC) = HIGH then choose JOIN; else choose REFERENCE; case Card(RC) =LOW and Card(TC) = LOW do choose REFERENCE; }

Queries with target extension smaller than root extension and involving mono-valued path expressions (as presented in the section 5.1) should use join-based strategy, as presented in [14] as well as in the comparison of execution times and improvements obtained in the execution of both strategies for queries A and B. In this case, the join-based execution presented lower execution times in all configurations and greater speedup as more processors were added to the system. Response time of join-based execution decreases as more processors are added. When target extension is greater than the root extension, the best strategy choice is influenced by selectivity factor of the predicates. For query C (selectivity factor is equal to 10% of target extension), although join-based execution presents execution times higher than reference-based execution, response time converges with parallelism, mainly for 12 processors configuration. Therefore, in this case, join-based alternative is as valid as reference-based. For query D (selectivity factor is equal to 90% of target extension), reference-based execution always

presents lower execution times in all configurations. Therefore, for high selectivity factors, reference-based strategy is the most adequate solution. 6. Conclusions This work focuses on analyzing two different strategies that may be used by the query processor for the path expression processing in the context of parallel object-oriented databases. The evaluated strategies are called join-based and pointer-based. We have presented experimental results obtained on top of the ParGOA system, which is a parallel object persistence manager developed at COPPE/UFRJ, using an IBM SP2 parallel computer. The experimental studies were conducted on top of a distributed memory model architecture using horizontal fragmentation techniques. We have also evaluated different system configurations, varying the number of processors used (1,2, 4, 8 and 12). The experimental studies clearly show the advantages of each path expression processing strategy, thus suggesting that the DBMS implements both algorithms and benefits from their combination. It is also clear that the size of the operand collections have a strong influence on both strategies. Therefore, we could point out some situations in which the pointer-based strategy performs better than the join-based one. In the same way, we noticed that the pointer-based strategy (sometimes referred to as “naïve pointer-based”) greatly benefits from techniques such as object clustering, and thus should be considered both for sequential and parallel processing. We are currently working on evaluating the combination of both strategies in the processing of the same path expression, when this path expression has a size greater than two. Future work includes further discussions about important issues we have identified in this work, and additional experiments on top of a larger database so that it does not fit in the memory of the 12 processors scenario. Also, an interesting approach would be the evaluation of other join-based algorithms, such as in [1].

Acknowledgements - The authors would like to thank Fernanda Baião for reviews and discussions, CAPES and CNPq funding agencies, and LNCC (National Laboratory for Scientific Computation). References [1] Braumandl, R., Claussen, J., Kemper, A., 1998. Evaluating Functional Joins Along Nested Reference Sets in Object-Relational and Object-Oriented Databases. Proceedings of the 24th VLDB Conference, pp. 110-121, New York, USA. [2] Carey, M.J., DeWitt D.J., Naughton, J.F., 1993. The OO7 Benchmark. Proc of 1993 ACM SIGMOD International Conference on Management of Data, Vol. 22(2), pp.12-21, Washington, USA. [3] Cattel, R.G. et al., 2000. The Object Database Standard: ODMG 3.0 - Morgan Kaufmann Publishers. [4] Crow, L.A., 1994. How to Measure, Present and Compare Parallel Performance. IEEE Parallel & Distributed Technology, Vol. 2, No. 1, pp. 9-25. [5] DeWitt, D.J., Lieuwen, D., Mehta, M., 1993. Parallel Pointer-based Join Techniques for Object-Oriented Databases. Proc. of the Intl. IEEE Conf. on Parallel and Distributed Information System, pp. 172-181, California, USA. [6] DeWitt, D.J., Naughton, J.F., Shafer, J.C., Venkataraman, S., 1996. Parallelizing OODMS traversals: a performance evaluation. VLDB Journal, Vol. 5, No. 1, pp. 3-18.

[7] Gardarin, G., Gruser, J.R., Tang, Z.T., 1996. Cost-based Selection of Path Expression Processing Algorithms in Object-Oriented Databases. Proc 22nd VLDB Conference, pp. 390401, India. [8] Keller, T., Graefe, G., Maier, D., 1991. Efficient Assembly of Complex Objects. Proceedings of ACM-SIGMOD International Conference on Management Data, 148-157. [9] Manish, M., DeWitt, D., 1997. Shared-Nothing Paralel Database Systems. VLDB Journal, Vol. 3 (10), pp. 53-72. [10] Mauro, R.C., Zimbrão, G., Brügger, T.S., Tavares, F.O., Duran, M., Lima, A.A.B., Pires, P.F., Bezerra, E., Soares, J.A., Baião, F.A., Mattoso, M.L.Q., Xexéo, G., 1997. GOA++: Tecnologia, Implementação e Extensões aos Serviços de Gerência de Objetos. XII Brazilian Symposium on Databases, pp. 272-277, Fortaleza, Brazil. (in portuguese) [11] Osthoff, C., Bianchini, R., Seidel, C., Mattoso, M., Amorim, C., 1999. Explorando Conceitos e Mecanismos de Memória Compartilhada Distribuída em E/S Paralela. XI Brazilian Symposium on Computer Architectures, pp. 287-292, Natal, Brazil. (in portuguese) [12] Özsu, M.T., Voruganti, K., Unrau, R.C., 1999. An Asynchronous Avoidance-Based Cache Consistency Algorithm for Client Caching DBMSs. Proceedings of the 22nd VLDB Conference, pp.440-451, Bombay, India. [13] Shaw, G., Zdonik, S., 1990. A Query Algebra for Object-Oriented Databases. Proceedings of the 6th International Conference on Data Engineering, pp. 154-162 [14] Shekita, E., Carey, M., 1990. A Performance Evaluation of Pointer-Based Joins. Proceedings of the ACM SIGMOD International Conference of Management of Data, pp. 300-311. [15] Straube, D.D., Özsu, M.T., 1991. Execution Plan Generation for an Object-Oriented Data Model. Proc of the 2nd Int. Conf. On Deductive Object-Oriented Databases, pp. 43-67, Springer-Verlag. [16] Su, S.Y.W., Ranka, S., He, X., 1999. Performance Analysis of Parallel Query processing Algorithms for Object-Oriented Databases. Technical Report Database Systems Research and Development Center, Univ. of Florida, USA. [17] Tavares, F. O., 1999. Avaliação de Processamento Paralelo de Consulta no Modelo Orientado a Objetos. M. Sc. Thesis, COPPE/UFRJ, Rio de Janeiro, Brazil. (in portuguese)

Parallel Processing Evaluation of Path Expressions - CiteSeerX

Parallel Processing Evaluation of Path Expressions - CiteSeerX

Suggest Documents

Index Structures for Path Expressions - CiteSeerX

MATH EXPRESSIONS Number Path

Dual Path Instruction Processing - CiteSeerX

Efficient Evaluation of Regular Path Expressions on Streaming XML ...

Parallel processing on supercomputers - CiteSeerX

DISTRIBUTED PARALLEL PROCESSING TECHNIQUES ... - CiteSeerX

Parallel processing on supercomputers - CiteSeerX

Parallel Processing of Multiple Pattern Matching ... - CiteSeerX

Access Path Expressions in Thai

Distributed and Parallel Path Query Processing for Semantic Sensor ...

Haptic Processing of Facial Expressions of Emotion in 2D ... - CiteSeerX

parallel image processing in heterogeneous computing ... - CiteSeerX

A System for Parallel Media Processing - CiteSeerX

Alpha Bridge, parallel processing under MATLAB - CiteSeerX

parallel image processing in heterogeneous computing ... - CiteSeerX

Distributed Hierarchical Control for Parallel Processing - CiteSeerX

Massively Parallel Processing Using Optical ... - CiteSeerX

parallel image processing in heterogeneous computing ... - CiteSeerX

Massively Parallel Processing on a Chip - CiteSeerX

Modular Visualization Environment and Parallel Processing - CiteSeerX

A toolkit for parallel image processing - CiteSeerX

Bit-Serial Parallel Processing Systems - CiteSeerX

Synthetic Workload Generation for Parallel Processing ... - CiteSeerX

The Parallel Image Processing System PIPS1 - CiteSeerX