Parallel Query Execution in PRISMA/DB. 1 Introduction. - CiteSeerX

Parallel Query Execution in PRISMA/DB. Annita N. Wilschut Peter M. G. Apers

1 Introduction. Recently, much attention has been paid to using multi-processor systems to improve the performance of relational database management systems (dbms). Many workshops and conferences about this topic were organized and their proceedings give extensive account of recent developments. A lot of eort has been spent on developing ecient algorithms for individual relational operations (selection, projection, cartesian product, join, union, intersection and dierence). Algorithms for both single and multi-processor environments are readily available and they have been compared analytically, by means of simulation, and via experimentation with implementations [Vald84, Brat84, Dewi84, DeWi85, Schn89]. These algorithms optimize individual operations without regarding their behavior in a nested query . On the other hand, the theory of query optimization tries to rearrange nested query trees in such a way that they can be evaluated eciently, using the cheapest algorithms for individual operations[Ceri84]. In this paper, the possibilities for parallelism in the execution of nested queries on PRISMA/DB are studied. In this study, both the algorithms for individual operations and the order in which operations are executed are taken into account. This paper is organized as follows: The remainder of this introduction is about parallelism and it introduces prisma. Section 2 describes query execution on prisma/db and derives a model for this execution. The possibilities for parallelism are identi ed. Section 3 is on algorithms for individual operations, their behavior in a parallel environment and it introduces pipelining algorithms. Join algorithms are used as an example in this section and the next one. Section 4 studies the implications of the results found in section 3, for query optimization. Section 5 concludes and summarizes the paper.

1.1 Parallelism

Multi-processor systems try to gain performance through concurrent execution of the computations. An elaborate coverage of the taxonomy of parallelism can be found in [Bora85] and [Wils89]. Here only the forms that are relevant to this paper are de ned. A task is decomposed into a number of similar subtasks that are each executed independently on dierent parts of the data on dierent processors. The results of the subtasks are eventually combined to form the result. Task-spreading requires a coordinating process that hands out the subtasks and collects the results if necessary. If the subtasks consist of equal amounts of work, the speedup of task-spreading is expected to be proportional to the number of processors involved. This form of parallelism is called (pure) parallelism in [Bora85].

task-spreading

The work reported in this document was conducted as part of the PRISMA project, a joint eort with Philips Research Laboratories Eindhoven, partially supported by the Dutch "Stimuleringsprojectteam Informaticaonderzoek (SPIN)". 0

1

A task is decomposed into a number of dierent subtasks that have to be executed consecutively on the same datastream. The subtasks can be assigned to dierent processors. Every subtask reads its input from its predecessor and sends its output to its successor. Subtasks are activated, when the rst data reach them. When the rst data reach the last subtask before the rst subtask is done, all subtasks work simultaneously until the rst subtask is done. Because of this staging in the execution it is hard to predict the performance gain from pipelining.

pipelining

The use of task-spreading and pipelining to speed up the execution of a single operation is dealt with in [Schn89, Rich87]. Relational operations are usually part of larger query trees though, and pipelining between operations during the execution of a query tree can be used in data ow database systems. To pro t optimaly from pipelining, the operations in the pipeline must be especially suited for this type of query execution.

1.2 PRISMA

The PaRallel Inference and Storage MAchine prisma is a highly parallel machine for data and knowledge processing. The hardware con guration is a general purpose, shared nothing multi-processor system [Ston86]. Such architectures are frequently used these days to implement dbmss on [DeWi88, Bora88]. A shared nothing architecture consists of a number of nodes that each consist of a processor, a local memory and (may be) a disk. The nodes are interconnected via a high bandwidth communication network. The prisma-machine contains 100 nodes that each contain a data processor, a communication processor and 16 Mbyte of local memory. 50 nodes have a disk and some nodes have an ethernet card that provides an interface with a host computer. Each communication processor connects a node to 4 other nodes. In this way a fast, high-bandwidth network is provided. An extensive introduction to the system can be found in [Kers87] and in [Wils89]. Here, only the features that are important for this paper are summarized. The machine is designed to support a relational main memory database management system prisma/db. The entire database is stored in main memory. Disks are provided for backup. To gain performance database and to make storage in main memory feasible, the tuples belonging to one relation are distributed over more than one node (not necessarily over all 100 nodes).

2 Query Execution in PRISMA/DB. In prisma/db, relations are fragmented horizontaly into fragments: Each fragment of a relation contains part of the tuples of the relation. The fragmentation is disjoint and complete, so each tuple belongs to exactly one fragment. A fragment has a process associated with it, that executes operations on that fragment. Such a process is called a One Fragment Manager (OFM). Both base data and intermediate data are managed by OFMs. OFMs for base-fragments are created at system startup; OFMs for intermediate data are created during query execution. The result of an operation is sent to the OFM that needs it for further processing, if it is an intermediate result, or to the user, if it is an end result. Intermediate results can be fragmented. In that case, the output of an operation is distributed over more than one OFM. After nishing a transaction, the OFMs managing intermediate results are disposed of. The base OFMs stay alive waiting for a next transaction that needs their data. If enough processors are available each OFM can have its private processor. As a simpli cation, it will be assumed that enough processors are available, because we want to understand the behavior of such a data ow system with independent processes before the more dicult situation in which dierent operations have to share one processor, is tackled. The process of query execution is governed by a transaction manager, that activates the OFMs of the base data participating in a query and generates intermediate OFMs. Each transaction has its private transaction manager process.

PROJECT

1 JOIN * YHH H SELECT SELECT 6 6 A

B

6

JOIN

PPPP i JOIN * YHH H

SELECT

SELECT

C

D

6

6

Figure 1: rdag representation of ((A 1 B ) 1 (C 1 D))

2.1 Execution Model

Query execution in prisma/db ts in the data ow model for parallel query execution [Alex88], [DeWi88]. In such a model the data that is processed in a query, is assumed to ow along various operation processes (OFMs in the prisma-context that start processing the data when it is available. The operations can work in parallel or pipelined, depending on their position in the query. An execution plan for a query on a relational database can be represented as a rooted directed acyclicgraph (rdag). The nodes in the graph correspond to operations on data, the incoming edges of a node specify the operands of an operation and the outgoing edges point to the next operation to be performed on the data. The leaves of such an rdag (nodes without incoming edges) represent operations on base data and the root basically absorbs the result of the query. Figure 1 shows a query and its rdag representation. The query in the gure actually is a tree, which is a special rdag. A non-tree rdag emerges, when intermediate results are fragmented. This way of representing queries corresponds to the data ow model of query execution.

2.2 Possibilities for Parallelism

Because each OFM resides on one processor no parallelism is used during the execution of an operation on one fragment. The query execution strategy for prisma/db allows both task-spreading and pipelining between operations though. Both forms of parallelism can be distinguished in the example query of gure 1. The selections on relations A, B , C and D can be performed independently in parallel.As soon as the selections on A and B start producing tuples, the join over these two operations can start processing the incoming tuples. In this way pipelining can be exploited. Task-spreading between independent operations has been used a lot in parallel DBMSs. Pipelining between operations that are in a producer-consumer relation is not used extensively though, because the well known algorithms for relational operations do not t in a pipeline very well. The next section will show how pipelines consisting of more than two operations can be set up using so called pipelining algorithms.

3 Join Algorithms. This section describes the execution characteristics of a linear join tree. Consider a four-way join between between fragments A, B , C and D. Figure ?? shows the linear join-tree for this query.

4 Join order: Join graphs and Join-trees. 5 A look into the future. References [Alex88] [Aper82] [Brat84] [Brat89] [Bora85] [Bora88] [Ceri84] [Dewi84] [DeWi85] [DeWi88] [Eich89] [Kers87] [Rich87] [Schn89] [Ston86] [Vald84] [Wils89]

W. Alexander and G. Copeland, Process and Data ow Control in Distributed Data-Intensive Systems, Proceedings of the 1988 SIGMOD conference, Chicago, USA, June 1988. P.M.G. Apers, Query Processing and Data Allocation in Distributed Database Systems, PhD Thesis, Mathematisch Centrum, Amsterdam, 1982. K. Bratbergsengen, Hashing methods and Relational Algebra Operations, Proceedings of the 10th conference on Very Large Databases, Singapore, August 1984. K. Bratbergsengen and T. Gjelsvik, The Development of the CROSS8 and HC16-186 Parallel Database Computers, Proceedings of the 6th International Workshop on Database Machines, Deauville, France, June 1989. H. Boral and S. Red eld, Database Machine Morphology, Proceedings of the 11th conference on Very Large Databases, Stockholm, Sweden, August 1985. H. Boral, Parallelism in Bubba, Proceedings of the First International Symposium on Databases in Parallel and Distributed Systems, Austin, Texas, USA, December 1988. S. Ceri and G. Pelagatti, Distributed Databases: Principles and Systems, McGraw-Hill, 1984. D.J. DeWitt et al., Implementation Techniques for Main Memory Database Systems, Proceedings of the 1984 SIGMOD conference, Boston, USA, June 1984. D.J. DeWitt and R. Gerber, Multiprocessor Hash-Based Join Algorithms, Proceedings of the 11th conference on Very Large Databases, Stockholm, Sweden, August 1985. D.J. DeWitt, S. Ghanderarizadeh and D.Schneider, A performance analysis of the Gamma Database Machine, Proceedings of the 1988 SIGMOD conference, Chicago, USA, June 1988. M.Eich, Research Topics in Main Memory Database Systems, Proceedings of the 6th International Workshop on Database Machines, Deauville, France, June 1989. M.L. Kersten, P.M.G. Apers, M.A.W. Houtsma, H.J.A. van Kuijk, R.L.W. van de Weg, A Distributed, Main Memory Database Machine, Proceedings of the 5th International Workshop on Database Machines, Karuizawa, Japan, October 1987. J. P. Richardson, H. Lu and K. Mikkilineni, Design and Evaluation of Parallel Pipelined Join Algorithms, Proceedings of the 1987 SIGMOD conference, San Francisco, USA, June 1987. D.A. Schneider and D.J. DeWitt, A performance Evaluation of Four Join Algorithms in a Shared-Nothing Multiprocessor Environment, Proceedings of the 1989 SIGMOD conference, Portland, USA, June 1989. M. Stonebraker, The case for shared nothing, Database Engineering 9.1, 1986. P. Valduriez and G. Gardarin, Join and Semijoin Algorithms for a Multiprocessor Database Machine, ACM Transactions on Database Systems 9.1, March 1984, 133-161. A.N. Wilschut, P.W.P.J. Grefen, P.M.G.Apers, M.L.Kersten, Implementing PRISMA/DB in an OOPL, Proceedings of the 6th International Workshop on Database Machines, Deauville, France, June 1989.

Parallel Query Execution in PRISMA/DB. 1 Introduction. - CiteSeerX

Parallel Query Execution in PRISMA/DB. 1 Introduction. - CiteSeerX

Suggest Documents

A Model for Pipelined Query Execution. 1 Introduction - CiteSeerX

Data ow Query Execution in a Parallel Main-Memory ... - CiteSeerX

Multi-Level Parallel Query Execution Framework for CPU ... - CiteSeerX

Query Execution and Optimization Query Execution: Parameters ...

CONSTRAINT QUERY LANGUAGES 1 Introduction - CiteSeerX

Video query formulation ABSTRACT 1 INTRODUCTION - CiteSeerX

parallel execution of hash joins in parallel databases - CiteSeerX

Load Balancing for Parallel Query Execution on NUMA ... - smis - Inria

Multi-Level Parallel Query Execution Framework for ... - notapaper.de

Parallel Inductive Logic Programming 1 Introduction - CiteSeerX

Parallel Load Balancing for Dynamic Execution ... - CiteSeerX

Hardware-Assisted Parallel Execution of Component ... - CiteSeerX

TUNING PARALLEL EXECUTION

Separating Authentication from Query Execution in Outsourced ...

Parallel Thread Execution ISA

Query Optimization and Execution Plan Generation in

Modelling Resource Utilization in Pipelined Query Execution

Exploiting Reconfigurable FPGA for Parallel Query ... - CiteSeerX

1 Introduction to Parallel Computing

Issues in Parallel Execution of Non-monotononic ... - CiteSeerX

Comprehensions, a Query Notation for DBPLs 1 Introduction - CiteSeerX

Comprehensions, a Query Notation for DBPLs 1 Introduction - CiteSeerX

Data Streams Organization in Query Executor for Parallel ... - CiteSeerX

Parallel execution and scriptability in micromagnetic simulations