The Logical Protocol Unit

79 downloads 0 Views 60KB Size Report
The Grid File Database System (GFDBS) is an advanved relational database ... server (processing server) and several application processes (SQL-, QBE-, client.
The Logical Protocol Unit Erich Schikuta Institute of Applied Computer Science, Dept. of Data Engineering, University of Vienna Rathausstr. 19/4, A-1010, Vienna, AUSTRIA E-mail: [email protected] Abstract. The performance of distributed systems depends heavily on the design of interprocess communication facilities of the underlying system architecture. In this paper we aim towards an optimal communication protocol for distributed system architectures. Thus we present different approaches of the logical structure of the protocol of the interprocess communication (IPC) systems and analyze the resulting runtime behavior heuristically. We introduce the notion of a logical protocol unit (LPU). An LPU groups all basic operations (or actions) of the system logically into units, which are transferred in one single block via the interprocess communication facilities of the underlying operating system. An algorithm for the calculation of specific LPUs for software applications in practice is presented. It is shown heuristically that the overall performance of real systems can be improved dramatically by the employment of different protocol structures based on the LPU approach. Testbed for the heuristic analysis was an advanced relational database system.

1. Introduction Large software systems consist generally of several distributed processes, which execute virtually parallel (time-sliced) on sequential machines or physically parallel on parallel system architectures. To partition a large program into smaller processes has the advantage to gain higher performance by workload sharing, smaller program sizes (smaller main memory requirements), logical structuring of tasks, to name only a few. This is most accomplished by a so called server-client architecture of the software system. Such systems consist basically of an master process (often denoted as the ‘server’), which manages the client processes. Two different server types can be distinguished, the managing server and the processing server. Managing servers administrate the client processes only, accumulate the results calculated by the clients and establishe the interface to the user or other application programs. This is the

common situation found generally in decentralized architectures, e.g. high performance computing applications. In contrary processing servers accomplish the main work load of the system and communicates with client processes only to provide results to them. This situation can be found in centralized systems, e.g. database systems. Thus, one of the most crucial parts in the system development, which influences the overall system performance behavior, can be the design of the communication administration. Basically information exchange between collaborating processes is realized by the interprocess communication facilities [Rochkind, 1985] of the operating system of the underlying system architecture The factors of efficiency and usage of the interprocess facilities is of crucial importance for the performance of the whole system architecture. The efficiency is dependent on the operating system and cannot be influenced by the application. However the way of usage of the IPC facilities is defined by the design of the system and therefore controllable and tunable. This leads to a proposed threefold approach to gain higher performance • Analysis of the communication streams of the system semantically, • Design of the optimal logical information structure and • Usage of the characteristics and adaptation to the underlying IPC tools of the operating system as optimal as possible. In this paper we will develop a method for a structural approach supporting this design process. The paper is organized as follows. In the second section we give a short survey about similar approaches found in the literature. In the next section we introduce our approach, the logical protocol unit, the functional basis of our design, and define an algorithm for its calculation. In the following we show how a logical processing unit (LPU) can be mapped to the underlying IPC system. A heuristic analysis based on a real implementation into an advanced relational database system is presented. Finally we show the results of the proof of concept implementation leading to the justification of our approach.

2. Communication Protocols The design, implementation and analysis of communication protocols in general and for distributed systems in particular has gained a wide coverage in the literature. Generally the published papers can be classified into three categories, the formal specification and verification of new protocols, the design and implementation of novel approaches and the performance study and analysis of implemented protocols. A general and highly recommendable survey about protocols was presented by Tanenbaum in [Tanenbaum, 1981a] and in his book [Tanenbaum, 1981b]. In the area of formal methods a number of papers were published about algebraic

specification and logical verification, as [Blumer, Tenney, 1982], [Bochmann, Gecsei, 1977] or [Bochmann, Raynal, 1983]. Especially noteworthy is the work done by Peterson et al. [Peterson, Buchholz, Schlichting, 1989], who took into account semantic knowledge of the transferred protocol information for the protocol design. They developed the Psync mechanism, which explicitly encodes the partial ordering of externally visible events in a distributed communication network. The design of specialized transport protocols is an active area of research. A number of approaches were published, which propose specialized protocols for specific hardware and software architectures and application domains. A good entry in this area is the paper by Watson et al. [Watson, Mamrak, 1987], which examines transport protocol mechanisms and implementation issues. They argue that a general purpose transport protocol can be effective in a wide range of distributed applications. This idea was also followed in our approach. The third area of performance oriented issues with the design and implementation of communication protocols is covered by a numerous wealth of papers, as [Carbrera, Hunter, Karels, Mosher, 1988] or [Son, Chang, 1990] to mention only a few. They study the impact of different processors, network hardware interfaces across machines and also the effect of the loading of hosts and communication media to catch the dynamic behavior of the communication facilities.

3. The Logical Protocol Unit For the following discussion of the problem the term of a logical protocol unit (LPU) is introduced. It can be described informally as a collection of information (operations and data), which are transferred jointly by the IPC facilities. It can be seen as an atomic transaction changing the state of the system from one synchronized situation to another. Synchronized means that the communicating processes can react to arising exception situations of the system sensibly. 3.1 Atomicity Atomicity in our sense of protocol units is similar to the definition of an atomic action used in the area of transaction processing in database systems. It is a sequence of actions which either are performed in entirety or are not performed at all [Ceri, Pelagatti, 1984]. An action in the context of transaction processing is basically a read or write operation of the database system [Vossen, 1991]. An action in our terminology can be any transfer of information between processes.

3.2 Triggers-Actions and Information-Actions At a closer look we have to distinguish between trigger-actions and informationactions. Definition: A trigger-action forces the system to transfer the state of the system to a new state. An information-action delivers the data necessary for the transition between states. Informally we can speak of triggers and information. Example: We assume a database application and the insertion and deletion of an employee data record into and from the employee table of the database. (Generally we use database system transactions as examples, which corresponds to the following heuristic analysis.) The system consists of a number of client processes, as user interfaces or database applications, and a centralized database server. In our notation this is called a ‘working server’ situation. The insertion transaction consists of the information transfer action, which is the transport of the employees information, as employee-number, name, department-number, etc., and the trigger-action, which executes the insertion or deletion of the record in question. A conventional database transaction is different to the situation of an LPU transaction. In database systems normally more than one action (usually the number of actions stored in the transaction table) has to be rolled back to react to an exception. This 'reaction' is generally a recovery, which can be triggered by a client request or is automatically triggered by a special system state, like a system crash. The length of a transaction atom is defined by the placement of a BOT (begin of transaction) and an EOT (end of transaction). This information is provided by the client program. For an LPU this situation is not comprehensive enough, because in a protocol unit nothing is said about the actual length of the unit sequence, i.e. the number of actions. The maximal number of actions of an LPU is restricted by the constraint that the system has to react to any exception situation with a single (and obviously final) recovery action. Therefore the system has not to rollback a number of LPUs to reach a consistent state of the system again, but only one. 3.3 LPU Definition Hence follows the definition of an LPU (slightly different to the definition of a database transaction known from the literature [Vossen, 1991]). Definition: A Logical Protocol Unit (LPU) is a (finite) sequence a1, ... an of basic actions which either are performed in entirety or are not performed at all and where the client program can react to an exception situation with a single recovery action. In the presented ‘insertion’ example in section 3.2 the whole transaction is an LPU. One possible exception can be a doubled employee-number. The server rejects the

insertion and the client process can request a new number for the employee record in turn. Another, not so obvious, exception can be the use of a wrong data type, e.g. a character input for the department-number (supposing there is no user interface check). This is checked during the evaluation of the data, which is provided by the information-action, and the user is informed. 3.4 LPU Computation The computation of all possible LPUs can generate a rather large number of action sequences. Therefore we define an algorithm, which performs this task systematically. First all trigger-actions T = t 1, ... tn have to be determined, and all informationactions I = i1, ... im have to be categorized into groups G = g 1, ... gk, where k ≤ m. Obviously the following properties are valid, for all ii, ii ∈ gj gj ∩ gk = ∅, if j ≠ k, gj ≠ ∅, ∪gj = I In other words each information-action is mapped to exactly one group and the groups are disjoint to each other. Further, the union over all groups comprises all information-actions. These groups contain all actions, which cover similar information providing functions (in our example the insertion data providing group). Then we have to map the groups G to their possible trigger-actions T. A group can be mapped to more than one trigger-action. Finally the possible LPUs are described by all possible sequences of elements of groups mapped to the several trigger-actions. This can formally be expressed by the following algorithm. Algorithm LPU creation algorithm createLPU determine all trigger-actions T = t 1, ... tn group all information-actions in groups G = g 1, ... gk for each t ∈ T find all g ∈ G which belong to t producing sets of groups G t an LPU is defined by the sequence [i j]t with ij ∈ Gt endfor end createLPU

We use the commonly known grammar notation. Actions contained in square brackets ’[’ and ’]’ must be applied at least once. The runtime expense of the algorithm can be calculated easily. Without loss of generality we assume that the number of actions (information and trigger together) is given by n. The idenification of all trigger-actions is performed by a sequential scan process over all actions resulting in an expense of order of n (O(n)). The same is valid

for the grouping of the remaining information-actions. Now we have to find for each trigger-action t the respective groups. The number of trigger is of order n and the groups of order n as well, which results in a expense of n2. Therefore we can conclude that the runtime expense of the ‘createLPU’ algorithm is of order n squared, O(n2). Example: Lets consider a (rather simple) database example for the insertion and deletion of records. The set of trigger-action T is defined by the insertion action t1 and the deletion action t2, therefore T = {t1, t2}. The record data is transferred via a number of specialized information-actions representing the transfer of the record attribute, one for each data type, I = {a1, a2, a3, a4} (e.g. short, long, floats, strings respectively). Obviously only one group exists, because the attribute actions belong logically together, resulting in G = g1 = {a1, a2, a3, a4}. The information-action group g1 is mapped to both triggers t1 and t2. This results in 2 LPUs, which are defined by all possible sequences of attribute actions of g1 and a trailing insertion or deletion trigger respectively. This is the same result we determined informally above. The heuristic analysis in section 5 presents a more comprehensive example of a real application.

4. The Physical Protocol Unit For the realization of the protocol in-depth knowledge about the LPU and the size of the message unit (the unit which sent via the IPC facilities) is necessary. A conventional approach would send each action by one message unit, which is obviously not optimal The concept of an LPU can be used for developing more efficient method. The system can react to an exception within an LPU in a similar way as to the exception within a normal action. So we can expand the normal message unit to hold a group of actions. We call a physically transferred message unit with respect to the LPUs a physical protocol unit (PPU). Definition: A physical protocol unit (PPU) is the unit of information, which is transferred between communicating processes via the IPC facilities of the underlying operating system. For the design of a suitable (and hopefully optimal) communication protocol we have to determine the optimal length of a PPU. Optimal in our context is defined by the highest practical throughput of the system. Therefore we state the following theorem. Theorem: The optimal PPU is defined by the PPU which can accommodate the longest LPU.

This theorem is informally supported by the common ‘rule of thumb’ that "few larger units outperform many small units", which can be found in the literature [Rochkind, 1985]. We will prove this theorem heuristically by implementing protocols with different PPU length for an existing database system and measuring the overall throughput of the system.

5. Heuristic Analysis The Grid File Database System (GFDBS) is an advanved relational database system, which was developed during the last few year at the University of Vienna [Schikuta, 1990]. The internal structure of the GFDBS is the Grid File [Nievergelt, Hinterberger, Sevcik, 1984], a dynamic multi-dimensional data structure, which is extremely suitable to store complex objects and to support range queries. Based on the properties of the Grid File a parallelized version of the GFDBS, the PARGRID database system [Schikuta, 1991], for distributed memory architectures is developed. The GFDBS is a typical example of a client-server system consists of a database server (processing server) and several application processes (SQL-, QBE-, client processes) communicating by message queues. All client programs are communicating with the server process via the IPC facilities by a defined set of database server commands (creation, deletion of a file, input, output, update, query, etc.). All data records (input and output) are transferred via the IPC-facilities, too. 5.1 GFDBS Commands The interface of the GFDBS server process to the application (or client) processes comprises commands for the following operational groups. • Communication with the server • Creation and Deletion of tables • Manipulation of tables (insertion, deletion, and update of tuples in the tables) A more comprehensive description of these commands is presented in the Appendix of the paper. 5.2 Computation of the LPUs for the GFDBS A sequence of GFDBS commands creates a logical unit of execution for the server. A conventional approach of the communication protocol sends each command for itself to the server. Now we can determine the LPUs of the server commands and send them via the PPUs of the protocol. In the following examples of typical LPUs (the creation of a table, query of a table) are described. We concentrate on the creation and the query of a GFDBS table only. Further database transactions can be handled in a similar manner.

According to the given algorithm ’createLPU’ we first determine all triggeractions. We find these actions GFInitCreate GFCreate TInit TQuery

(exception: table exists) (exception: attribute exists) (exception: table unknown) (exception: attribute unknown)

The possible exceptions to the trigger-actions are given in brackets. These exceptions force the server process to terminate the activity and recover, and in turn the client process to react accordingly. 5.2.1 Creation of a Table The creation of a table consists of 3 actions: • start, • attribute definition, and • creation. GFInitCreate ADef* GFCreate

(start of the creation process) (a sequence of attribute definitions) (creation of the table)

The ’ADef*’ actions are grouped together in an attribute information group. The mapping of the groups (in this case only one) reveals 2 LPU sequences. GFInitCreate is sent as an LPU to the server. This is in accordance with the definition of an LPU, because the client process has to have the possibility to react to an exception, like the situation that the table already exists. The ADef* commands (a sequence of attribute definitions) represent the information-actions. The GFCreate command represent the trigger-action. The client program can react to an exception of the ADef*, e.g. the erroneous specification of equal attribute names. The operation of a table creation is therefore split into 2 LPUs. 5.2.2 Query of a Table The query of a table consists of 3 actions as well: • start, • definition of the query restrictions, and • execution.

TInit APut* TQuery

(start of the query definition) (definition of the restrictions) (start of the query)

This case is handled in a similar way as the table creation. TInit represents one LPU. A possible exception could be e.g. an unknown Grid File. The APut* commands represent the information-actions again (a sequence of attribute values defining query predicates). TQuery is the trigger-actions, which concludes the LPU. In the case of an exception (e.g. attribute not available) the client program is informed and can react with only one action. The query transaction is therefore split into 2 LPUs. The same considerations have to be done in similar manner for the insertion, update, deletion and retrieving transactions of tuples. 5.3 Performance Evaluation For the evaluation of different PPU lengths a standardized performance test suite was created [Watzer, 1990]. A mix of frequently used database transactions was built and the transaction times recorded. The following database transactions were analyzed: • creation of a table, • insertion of tuples, • range-queries, and • point-queries A range query denotes the type of database query, where the key values of the query are defined by intervals on the attribute domain. The result comprises an area of the multi-dimensional space of the value domain containing (in most of the cases) a number of (different) result records. In contrary a point query requests a single record, which is defined by specification of values for all key attributes. The size of the PPU sequence (number of blocked commands) was varied. The following charts show the performance of the system for different PPU sizes (block sizes). The performance behavior is shown by two characteristics: • the number of sent message blocks, and • the consumed time Basis of the test metric was the size of the largest basic action. This is represented in the figures by the PPU size of 1. In the conventional case at least one basic action can be transferred with one message unit. Thus all numbers, which which are smaller than the values (number of sent messages, execution times) for size 1 (the conventional situation), represent an improvement achieved by the LPU approach. All test runs were performed for 50 Grid Files with 2 to 10 attributes. In the following only the average values for a Grid Files with 7 attributes are shown. This represents

the average file size in our application profile. For increasing numbers of attributes the results show that the size of the LPUs gets obviously even stronger influence on the performance of the system. As shown in Figure 1, if the size of the PPU is large enough to hold an LPU completely, so no more performance gain can be achieved by an increase of the PPU size (see range and point queries for PPU size 10 and 25). Therefore we can conclude that we have always to find the largest possible LPU (i.e. the LPU with the longest action sequence) in our system and implement PPUs, which can hold this LPU. number of messages 35

point query range query

30

insertion 25

creation

20 15 10 5 0 1

2

10

25

size of PPU

Figure 1: number of transferred messages versus PPU size Figure 2 gives us the amount of consumed time for the same test profile. Finally Figure 3 shows the overall accumulated execution times (creation + insertion + range query + point query) representing the overall throughput of the whole system. It can be seen easily that the throughput was doubled by the introduction of LPUs. It shows apparently the improvement of the communication protocol based on the LPU concept (time for PPU size 10) compared to the conventional method of sending single actions (time for PPU size 1). Further it confirms the above stated property too. Beyond a given PPU size no further performance increase can be achieved (time for PPU size 25). In contrary, it can lead to a decrease because of the systems overhead for very large message blocks.

18 point query range query

14

insertion

12

creation

10 8 6 4 2 0 1

2

size of PPU

10

25

Figure 2: execution times in seconds versus PPU size

20 18 16 14 seconds

seconds

16

12 10 8 6 4 2 0 1

2

10 size of PPU

Figure 3: overall system’s execution time versus PPU size

25

6. Results We know that the presented analysis of LPUs and PPUs neglects certain topics, which can have effects on the behavior of a distributed system in practice, like network reliability, network topology, processor characteristics, etc. We are aware of this restrictions and derive our conclusions under this assumption. However, we assume that for our analysis the underlying architecture presents a coherent profile to the processes. Therefore the results of the heuristic analysis can be summarized shortly. • LPUs increase the performance of the system dramatically (for the database system the performance was doubled). • PPUs should be large enough to hold an LPU completely. • If PPUs are to large, no further performance improvements can be achieved. In contrary, such PPUs can decrease the overall system throughput.

If the LPU approach is followed in the design of IPC protocol for the message passing systems, we recommend to implement longest possible sequences of LPUs. The optimal size of an PPU can only be decided by an exhaustive heuristic analysis. Generally it can be said that the LPU approach provides an appropriate tool to increase the performance of IPC based systems.

Acknowledgment Many researchers and students made contributions to the GFDBS-project. On this occasion I want to thank Harald Watzer for performing the comprehensive tests of the new LPU-message passing protocol.

References [Blumer, Tenney, 1982] Blumer, T.P. and Tenney, R.L., A Formal Specification Technique and Implementation Method for Protocols, Computer Networks, Vol. 6, No. 3, pp. 201— 217, 1982 [Bochmann, Gecsei, 1977] Bochmann, G.V. and Gecsei, J. A Unified Method for the Specification and Verification of Protocols, Proceedings of IFIP Congress, pp. 229-234, North Holland, 1977 [Bochmann, Raynal, 1983] Bochmann, G.V. and Raynal, M. Structured Specification of Communicating Systems, IEEE Transactions on Computers, Vol. 32, No. 2, pp. 120—133, 1983 [Carbrera, Hunter, Karels, Mosher, 1988] Carbrera, L.-F., Hunter, E., Karels, M.J., and Mosher, D.A. User-Process Communication Performance in Networks of Computers, IEEE Transactions on Software Engineering, Vol. 14, No. 1, pp. 38—53, 1988 [Ceri, Pelagatti, 1984] Ceri S. and Pelagatti G. Distributed databases, Principles, Systems, McGraw-Hill, 1984

[Nievergelt, Hinterberger, Sevcik, 1984] Nievergelt J., Hinterberger H. and Sevcik K.C. The Grid File: a data structure for relational database systems, ACM Trans. Database Systems, Vol. 9, No. 1, pp. 38-71, 1984 [Peterson, Buchholz, Schlichting, 1989] Peterson, L. L., Buchholz, N. C., and Schlichting, R. D, Preserving and Using Context Information in Interprocess Communication, ACM Transactions on Computer Systems, Vol. 7, No. 3, pp. 217—246, 1989 [Rochkind, 1985] Rochkind M., Advanced UNIX Programming, Prentice-Hall, 1985 [Schikuta, 1990] Schikuta E., The Grid File Data Base System, Proc. 10th SCCC International Conference on Comp. Science, Santiago de Chile, Chile, 1990 [Schikuta, 1991] Schikuta E., A Grid File Based Highly Parallel Relational Data Base System, Proc. 4th ISMM Int. Conference Parallel and Distributed Computing and System, Washington, D.C., USA, 1991 [Son and Chang, 1990] Son, S. H. and Chang, C. H., Performance Evaluation of Real-Time Locking Protocols Using a Distributed Software Prototyping Environment, Proceedings of the 10th Int. Conf. on Distributed Computing Systems, pp. 124—131, 1990 [Tanenbaum, 1981a] Tanenbaum Andrew S. Network Protocols, ACM Computing Surveys, Vol. 13, No. 4, pp. 453—489, 1981 [Tanenbaum, 1981b] Tanenbaum Andrew S. Computer Networks, Prentice Hall, 1981 [Vossen, 1991] Vossen G., DataModels, Database Languages and Database Management Systems, Addison Wesley, 1991 [Watson and Mamrak, 1987] Watson, R.W. and Mamrak, S.A., Gaining Efficiency in Transport Services by Appropriate Design and Implementation Choices, ACM Transactions on Computer Systems, Vol. 5, No. 2, pp. 97—120, 1987 [Watzer, 1990] Watzer H., Ein Hochleistungs-Message Passing System für das GFDBS, master thesis, Univ. of Vienna, 1990, in German

Appendix The following commands are provided by the GFDBS system. They are grouped according to their logical application domains.

Communication with the server GFDBSInit() starting the communication with the server GFDBSQuit() stopping the communication with the server

Creation and Deletion of tables GFInitCreate(Gridfilename gfname, Accessmask mask) start of creation of a new database or a new table, specification of the access properties ADefShort/Long/Float/String(Gridfiledescriptor gfd, Attributename attribute, KeyFlag attkey) definition of an attribute, specification of key property GFCreate (Gridfiledescriptor gfd) creation of a new table GFDelete(Gridfilename gfname) deletion of a table

Manipulation of tables GFOpen(Gridfilename gfname, Accessmask mask) opening of an existing tables, specification of the access properties GFClose(Gridfilename gfname) opening of an existing tables Tinit(Gridfiledexriptor gfd, Tuplebuffer tupleid) initialization of a new tuple (insertion, update, query tuple) APutShort/Long/Float/String(Gridfiledescriptor gfd, Tuplebuffer tupleid, Attributename attributename, Attributevalue value) initialization of attributes with a value TInsert/TUpdate/Tdelete(Gridfiledecriptor gfd) insertion, update and deletion of the information in the actual tuple buffer TQuery/TNext(Gridfiledecriptor gfd) query of a table AGetShort/Long/Float/String(Gridfiledescriptor gfd, Tuplebuffer tupleid, Attributename attributename, Attributevalue value) retrieving attribute values