systems that uses the processing pattern of a typical. British high street bank as .... within a special composite entity in order to create a complex object with the ...
SIMULATING TRANSACTION PROCESSING IN PARALLEL DATABASE SYSTEMS Chris Bates1, Innes Jelly1, Ivan Lalis2, Peter Menhart2 1 Computing Research Centre, Sheffield Hallam University, Napier Street, Sheffield, S11 8HD, U.K. 2 Dept. of Computer Science and Engineering, Slovak Technical University, 812 19 Bratislava, Slovakia e-mail {c.d.bates | i.e.jelly}@shu.ac.uk, {menhart | lalis}@elf.stuba.sk
ABSTRACT In this paper we discuss the design of a parallel database machine and the desirability of using simulation experiments of the machine at an early stage of the design process as a way of assessing the performance of the finished machine. We discuss the reasoning behind the choice of a distributed discreteevent simulation model, and design and implementation issues in this approach. Lastly we discuss how the simulation may be used to rapidly develop experiments which examine the behaviour of the database machine.
KEYWORDS Database management systems, performance analysis, hierarchical object-oriented models, distributed simulation
1. INTRODUCTION Modern database systems are increasing in complexity due to the ever greater processing requirements of endusers who need both a reduction in the speed of response and increases in the volumes of data handled. It has been estimated that a single NASA project, the Earth Observing System, will produce 11,000 terabytes of data, roughly 1,000 times the size of the U.S. Library of Congress (Robinson 1991). The key requirement for any database system is no longer the ability to store data in a form that can be easily accessed, but the manipulating of massive data volumes combined with very high transaction rates, which for banking systems may reach 70,000 transactions per hour.
The last 10 years have seen a lot of interest in the development of parallel database servers using advanced processing and data partitioning strategies whilst operating wholly within the relational model. Such machines are not only academic research projects, recently Oracle has been ported onto parallel hardware from ICL, Meiko and nCube (Bates et al 1995). An implementation of Oracle on a 64 node nCube has been reported as achieving over 1000 transactions per second in a commercial environment (Bobbie 1994). The development of a parallel machine can be very expensive and there is often no way of knowing how well the finished machine will perform. It is therefore highly cost-effective to assess expected performance levels at the start of the development process. This paper examines how an object-oriented toolset for building distributed simulation systems (Lalis and Menhart 1995) may be used to build a simulation of a parallel database machine. Rather than use an abstraction of a generic system, a recently developed architecture for scaleable parallel database machines is used as the fundamental processing unit (Kerridge 1995). A simulation-based approach supports the incremental development of the underlying database platform and means that the performance parameters of individual components can be fine-tuned in isolation. To assess performance the system must be tested under realistic operating conditions. The workload in this simulation is based upon a benchmark for parallel database systems that uses the processing pattern of a typical British high street bank as its starting point (Bates et al 1995). Section 2 of this paper discusses the database system itself and some of the rationale behind its development. Section 3 describes the simulation techniques used and the development of a simulation model of a parallel database machine. Finally section 4 of the paper discusses how a simulation can be used to examine not only the design of the machine but fundamental aspects
of the use of a parallel database system such as data partitioning strategy and workload balancing.
2. THE ARCHITECTURE OF THE DATABASE SYSTEM Recent work at the National Transputer Support Centre in Sheffield has led to the creation of a new architecture for scaleable parallel database systems based upon a main-memory storage control unit called the Data Access Component (DAC). The DAC is an intelligent storage controller which is able to extract structured data from partitioned relational tables and which provides significant performance improvements by preprocessing and filtering data so that the maximum possible amount of processing occurs close to the storage system. This ensures that as little data as possible is actually passed between processes and so reduces the load on the internal communication systems of the database. Partitioning is a method of spreading a relational table across a number of storage units and thus allowing concurrent accesses to each table. Results can be returned more rapidly because a single transaction which affects only one table can access every partition of that table at the same time. Where transactions which may have a long residency-time such as joins are taking place, when all of the required data has been returned from a partition that partition may move onto another task. Clearly in a parallel environment there are significant performance advantages to be gained from using this technique, not least because the database machine can be load-balanced for the most common types of transaction (Gray and Reuter 1993). The DAC provides most of the complex processing associated with database systems including select/project, grouping, sorting and set operations as well as updating the stored data and maintaining dynamic local indexes. However, the DAC does not support join operations. In any database system these are amongst the most complex and time-consuming operations but are not supported by the DAC as joins will usually access many partitions and so can most efficiently be performed by the access processes. Whilst there may be performance advantages in joining at the DAC level when all of the data to be used in the join is held on a single partition, there may be costs for the system as a whole in monopolising a single partition in this way. It is therefore nearly always better to use the DAC to retrieve the data and then perform the join elsewhere. This has implications for the use of the
DAC in a transaction processing machine. Transactions are usually short-lived operations which access only a few tuples. The transactions from the banking workload do not require joins and the simulation is not invalidated because they have not been provided. More complex workloads might have tested the limits of the validity of this initial simulation but as the simulation model becomes more complete, these “difficult” workloads may be used upon it. Within the parallel database machine the DAC is used as the fundamental processing unit (Figure 1). Although originally developed for main-memory systems each DAC may be connected to one or more SCSI-disks with the remainder of the machine comprising the processes needed to handle inputs and manage retrieved data: this is the design used in our approach. Transactions arriving at the database system enter a queue awaiting processing. Each queued transaction is sent to the correct partition, or set of partitions, by a global transaction manager which also controls the commit or rollback of completed Transaction Manager
Access Process
Access Process
DAC DAC DAC DAC discs
Access Process
DAC
DAC
.......
Figure 1: The Generic Database Machine transactions. At each partition a local transaction manager (shown as the Access Process) is responsible for queuing the transactions and ensuring that they all have access to the DAC, as well as requesting permission to perform commit or rollback from the global manager. The local transaction manager must also attempt to intelligently manage local disk-buffers. There is a rule-of-thumb which indicates that on a typical system 80 per cent of transactions access just 20 per cent of the data (Gray and Reuter 1993), hence it is clear that there are large performance gains to be made through the efficient use of buffers.
3. THE DESIGN OF THE SIMULATION MODEL The discrete-event model of the parallel database machine is specified in a hierarchical, object-oriented manner which allows both flexibility and step-wise refinement; altering one part does not invalidate the entire model or cause it to need totally re-building. The implementation is based upon an established toolset, written in C++, which permits the development of either sequential or distributed simulations (Lalis and Menhart 1995). The distributed simulation system has been developed to run on a network of transputers which allows a flexible approach to be taken to the placement of processes and to the structure of the hardware, and enables reconfiguration for more processors. The toolset provides separate run-time and communications modules and could be ported to other hardware architectures. The simulation model comprises both static and dynamic objects. There are two groups of static objects, or entities: pre-defined and user-defined; with the structure of the model being specified through their interconnection. Fundamental pre-defined objects (sources, sinks, branches, clones, merges, etc.) are used to create, terminate and route dynamic objects through the model. The user-defined entities are used to represent the application specific parts of the simulation model (in this case disks, transaction generators etc.). They are derived from the pre-defined entities by customising their behaviour. If details have to be added to increase the accuracy of the representation of an object, an entity may be replaced by a new one which uses the same interface. If the behaviour of an entity is too complex, it can be divided into a network of simpler entities encapsulated within a special composite entity in order to create a complex object with the same interface as the entity being replaced. The concept of the composite entity allows a hierarchical structure of the model. Dynamic objects contain a set of user defined attributes. They are launched from source entities, routed through the structure of the model, possibly changing the values of their attributes, and are terminated in sink entities. Dynamic objects are often called transactions, but we are already using this term in different meaning.
Although the behaviour of the model has been described as event-oriented, the standard non-object solution of event dispatching, one huge switch with an action defined for each event occurrence, has been replaced with the more powerful event-addressing mechanism. The address of the destination entity is stored as an attribute of the scheduled event in the event list so that events can be sent directly to the desired entity. This principle allows the behaviour of the entity to be described as a set of responses to specific events. The generic algorithm for the simulation run is based upon a formal approach presented in (Cingel and Safarik 1993) and adapted for distributed simulation. The encapsulation of entity behaviour together with the possibility of a hierarchical structure specification is suitable for modelling of huge systems which can be both divided into several parts, each dedicated to different modellers, and developed through stepwise refinements. This was the approach when creating a simulation model of the parallel database machine. Transaction Requestor
event trigger
Transaction Generator (1 transaction per request)
flow control
transaction start time
Results Gathering
Transaction Manager end time (of transaction) data operations Data Access Component
Figure. 2 The structure of the proposed simulation model, figure 2, reflects both the architecture of the database system and of external systems which are required for performance evaluation. It consists of several entities at the highest level; some of them are composite entities, refined at lower levels of the hierarchy. The Transaction Manager and Data Access Components reflect parts of the database machine as shown in figure 1. The Transaction Generator creates the workload to be “processed” by the simulation of the database machine. This workload is structured like a real workload as found in a banking environment and mimics the actions of users of a transaction processing system. The Transaction Requester allows the number of transactions resident in the database to vary so that
the simulation may show the database using all of its resources to process a single transaction, or having to manage many transactions concurrently. A graphical representation such as figure 2 can be converted directly into a textual description using a structure specification language as outlined in figure 3. The structure specification language was developed to express an intermediate representation of model structure and whilst being easily readable it is nevertheless suitable for conversion into the internal representation in a toolset-based simulation system. disim(SimProc ANY DATA(ENDTIME) CONNECT(0:0>1:0 1:0>2:0 2:0>5:0 2:1>3:0 3:0>2:1 2:2>4:0 4:0>2:2) source(Req DATA(Trigger) PORTS(0 1)) TransGen(Gen DATA(Query) PORTS(1 1)) TransMan(Man DATA(2..) PORTS(3 3)) composite(DAC1 DATA(..) PORTS(1 1) ATTACH(0>0:0 01:0 1:0>0:1) Driver(driver DATA(..) PORTS(2 2)) Disk(disk DATA(..) PORTS(1 1)) ) composite(DAC2 DATA(1..) PORTS(1 1) ATTACH(0>0:0 01:0 1:0>0:1) Driver(driver DATA(..) PORTS(2 2)) Disk(disk DATA(..) PORTS(1 1)) ) ResGath(Gath DATA(..) PORTS(0 1)) )
Figure 3 Each description in figure 3 defines the class and name of the entity, generic data parameters retrieved at runtime, number of input and output ports (an interface to other entities). The Data Access Components are composite entities, providing coupling of their ports (ATTACH) as well as of the ports of hierarchicaly lower entities (CONNECT). The SimProc is the root of the hierarchy, similar to a composite entity, therefore describes ports coupling of all entities at the next level. ATTACH statement is omitted, because the SimProc has no interface to other entities. The parts of the database system are modelled as independent objects interacting through the passing of data or event messages, offering a high degree of “useful” parallelism for an efficient simulation in a distributed manner. Conservative synchronisation algorithms are suitable because of promising lookahead potential. It is possible to specify the mapping of entities to processors in the structure specification language, allowing an automated model decomposition for the distributed simulation (Solcany et al. 1995).
To achieve the required degree of precision in the behaviour specification, parts of the program code of the actual parallel database machine have been incorporated into the simulation model implementation. An example of this for the transaction generator is shown in figure 4: typedef struct { int type, account, branch, ...; } trans_t; int GenerateTransaction(trans_t *); // fill in trans_t structure // original code is used here class Query : public Trans { // all dynamic objects are derived // from a Trans class public: trans_t tr; }; class TransGen : public entity { // all user defined entities // are based on a entity class protected: BOOL OnArrival(TRANSTYPE); // method called when EV_ARRIVAL // event is to be executed public: TransGen(...); }; //------------------------------------TransGen::TransGen(...) : entity(...) // constructor of base //class is called first { InitGenerator(); // initialisation // (if needed) } EVENT_MAP(TransGen) ON_EVENT(EV_ARRIVAL, OnArrival) // mapping of all events to methods ... END_MAP BOOL TransGen::OnArrival(TRANSTYPE *tr) { // virtual method for serving // EV_ARRIVAL event Query *ltr = (Query*)createObject("Query"); // creating an instance of // Query class GenerateTransaction(<r->tr); // filling in its attributes delete tr; // removing Trigger transaction ExecuteEvent(EV_DEPARTURE, ltr); // Query is sent out return FALSE; // no next serving in base class // is needed }
Figure 4
4. EVALUATING THE DESIGN OF THE DATABASE SYSTEM The simulation system described in previous section is flexible enough to allow the rapid development of a number of experiments. It can be used to examine the behaviour of the DAC when attached to more than one disk, different indexing and cache utilisation strategies can be assessed as, more obviously, can data partitioning and transaction throughput. One of the most important aspects of using this simulation system is that it will allow the design of a series of experiments to examine what happens when several transactions require concurrent access to the same disk or partition. A variety of algorithms for handling this situation can be assessed to see their effect upon the performance of the machine. Through the use of experiments it is hoped an algorithm can be developed which will allow the maximisation of throughput of transactions without substantially increasing the response time for individual transactions.
The structure specification language proved itself to be a powerful, user-friendly tool for setting simulation parameters at run-time. It has been shown to allow such fundamentals as the size and structure of the workload, database and database machine to be altered dynamically. Work is in progress to validate the simulation through experimentation on a working implementation of the data access components using the Sheffield Database Benchmark. Once the simulation has been shown to provide an accurate representation of a DAC-based transaction management system experiments will be performed which will examine the effects of transaction interleaving and data partitioning strategies.
Acknowledgement This work was supported by the Copernicus project no. CP93:6638.
REFERENCES
There are two important reasons for choosing to use a parallel database machine, speed-up and scale-up, which can be examined through a simulation. When designing a database machine scale-up is a key characteristic, such a machine has to be able to handle problems from the merely large through to the massive. Scale-up can be achieved at the design stage by adding more discs and processors, however once a machine is in use there is little chance of increasing its physical size without removing and repartitioning all of the data on the machine. A simulation is very useful here as it can be scaled easily which helps in matching the size of machine with the size of problem. A simulation may also be used to examine the effects of different data partitioning strategies, round-robin, range or hash, and the granularity of the partitioning itself.
C.Bates, I.Jelly, J.Kerridge, 1995 “Modelling Test Data for Performance Evaluation of Large Parallel Database Machines”, to appear in Distributed and Parallel Databases Journal.
5. CONCLUSIONS
I.Lalis, P.Menhart, 1995. “Object Oriented Toolset for Sequential and Distributed Simulation”, Proceedings of the European Simulation Multi-Conference ESM’95, Society for Computer Simulation, pp. 604-608.
This paper has shown that it is possible to examine the performance of a parallel database machine through the use of a distributed simulation running on a network of transputers. By using an object-oriented approach a DBMS simulation model can be developed which is very representative of a “real” machine. This allows powerful and informative experiments to be performed quite readily upon the simulation, the results of such experiments can then be fed back into the design of the DBMS, allowing the system to be tuned and enhanced in a rapid and cost-effective manner.
P.O.Bobbie, 1994. “Clustering Relations of Large Databases for Parallel Querying”, Proceedings of the 27th Annual Hawaii International Conference on Systems Science, IEEE. V.Cingel, J.Safarik, 1993. "Hierarchical object-oriented representation and modelling of timed discrete-event systems", Proceedings of the European Simulation Multiconference ESM’93, Society for Computer Simulation, pp. 46-50. J.Gray, A.Reuter, 1993. “Transaction Processing: concepts and techniques”. Morgan-Kaufmann. J.Kerridge, D.Walter, R.Guiton, 1995. “W-SQL: An Interface for Scaleable, Highly Parallel Database Machines”, Proceedings of the 13th British National Conference on Databases.
B.Robinson, 1991. “NASA Data Could Electronic Engineering Times, Vol. 29.
Skyrocket”,
V.Solcany, R.Skultety, J.Safarik, 1995. “Simulation model decomposition in conservative parallel discrete event simulation”, Proceedings of the European Simulation MultiConference ESM’95, Society for Computer Simulation, pp. 41-45.