Modelling the performance of CORBA using layered queueing networks

Modelling the performance of CORBA using Layered Queueing Networks Tom Verdickt

Bart Dhoedt Frank Gielen Piet Demeester Department of information technology Ghent University Ghent, Belgium {tom.verdickt|bart.dhoedt|frank.gielen|piet.demeester}@intec.UGent.be Abstract One of the typical features of distributed systems is the heterogeneity of its components (e.g. geographical spreading and different platform architectures), leading to interoperability issues. Many of these are handled by generic middleware-based solutions. This paper presents an analytic model of the impact of using such middleware on the overall system performance. Specifically, a Layered Queueing Network is described that models a client/server system using CORBA as a middleware system offering location transparency. The response times estimated from the model are compared to the measured response times for a growing number of clients, in order to assess the accuracy of the model and the values of the parameters in the model. This model can then be used for designing a distributed application, before the entire system is installed or even fully implemented.

1. Introduction and previous work The performance of a software system is a critical aspect of its quality. On the other hand, current software design processes apply a “fix-it-later” approach to performance: the application is designed to meet the functional requirements, while the non-functional requirements (such as performance) are only considered in the final stages of the development process (often during prototype testing). In order to meet the performance demands, lengthy finetuning, expensive new hardware or even (partial) redesign are necessary. And even after the fine-tuning, the system might still fail to meet all the requirements.

1.1. Software Performance Engineering To solve this problem, software engineering techniques have been designed to integrate the performance consider-

ations into the design. Performance modelling formalisms and quantitative solution methods are used throughout the entire development cycle (starting as early as possible), to check whether the system performance is satisfactory [12]. This way, the performance requirements are “built-in”, rather than added on later. Several modelling formalisms have been proposed, e.g. Petri Nets [9], Queueing networks [8], process algebras [1] etc. Automated tools exist for all these formalisms to derive performance metrics from the models, either by analytical techniques or by simulation (LQNS [3], SPNP [2]). Recently, efforts have been done to build translation tools to automatically construct performance models starting from more general-purpose software models (not specifically tailored to modelling software performance, e.g. UML models [5]). The advantage of such translation tools is that system designers do not need to learn another modelling formalism. The performance aspects can be added to the models that are already used to describe the functional and architectural aspects of the system (e.g. UML models), which greatly reduces the effort needed for performance analysis. The construction of the performance models will then be done by automated tools. In this paper, Layered Queueing Networks (LQN) are used to model system performance [3, 15]. LQNs are an extension to queueing networks, which are widely used to model the performance of computer systems. They allow the description of more complex interactions between components, needed when modelling real-life client/server systems.

1.2. Middleware In order to meet the ever-growing demands for processing power and the evolution towards heterogeneous, geographically spread sources of data, storage and processing power, distributed systems are getting more and more attention. Often, middleware is used in distributed systems, to

Proceedings of the 29th EUROMICRO Conference “New Waves in System Architecture” (EUROMICRO’03) 1089-6503/03 $17.00 © 2003 IEEE

provide interoperability between the various components of the system. Using middleware might also offer extra benefits, such as a naming service (giving a form of location transparency), event handling, etc. The Common Object Request Broker Architecture (CORBA) [10] is an important middleware standard. The growing interest in distributed systems induces a growing interest in performance engineering techniques for those systems. Unfortunately, the current performance engineering methodologies lack support for some key aspects of distributed systems, like the performance impact of using middleware technology. This paper tries to start filling the gap between distributed systems and performance modelling by presenting a model of a client/server system using an implementation of CORBA as middleware. In order to validate the results, a number of experiments have been done using ORBacus 4.1.0. Past efforts to model and predict the performance of CORBA based distributed systems, have resulted in several performance modelling frameworks for CORBA. In [13], some extensions to the SPE·ED tool are introduced, in order to model CORBA based distributed systems. It allows a detailed performance analysis (at the level of single calls), but does not present the overhead incurred by the middleware separately, leaving the burden of modelling the middleware on the performance analysts. It would be useful to have a “ready-made” middleware model that could be used in the model of the complete system and would allow the performance analyst to model the system without considering the middleware details. Another framework is presented in [7], which models the different conceptual layers of a distributed system (e.g. the application layer and the network layer) and connects those layers to obtain performance estimates for the entire system and its components. The middleware (along with some other infrastructural components) fits in a separate layer, which facilitates modelling, but once again the performance analyst needs to model the middleware himself. A more conceptual abstraction level is used in [11]. The middleware overhead is modelled by a separate component, but no distinction is made between the different sources of the overhead, like (un)marshalling call parameters, server look-up using the Naming Service, etc. The work reported in this paper attempts to find a balance between the abstraction levels used in previous middleware modelling frameworks, modelling conceptually different sources of overhead by separate components. The resulting model can easily be plugged into models of distributed systems, without the performance analyst needing to know the architectural details of the middleware (only some performance details of the middleware components need to be filled in). It also allows to pinpoint bottlenecks in the middleware more accurately, as the model results show

how much time is spent in each part of the middleware. The remainder of this paper is organized as follows. Section 2 gives an overview of the LQN modelling formalism. The CORBA middleware architecture is presented in section 3. Section 4 describes the setup used for validation of the performance model. The model itself is presented in section 5. A comparison of the output of the performance model and the measured results is given in section 6. Section 7 presents our conclusions.

2. The LQN formalism LQN is an extension to the widely used queueing network model. The toolset described in [3] contains analytical solvers and a simulator to extract performance estimates from an LQN model. The most important difference between LQN and traditional queueing networks is the fact that a server serving a client request can become a client of another server, thus modelling nested services and synchronous calls. This way, a concept of layering is introduced. The layering is not strict, though. Calls can target servers in the same layer as the client or can skip several layers. An LQN model is represented as an acyclic, directed graph with two types of nodes: tasks (software entities, drawn as parallelograms) and processors (representing all types of hardware devices, drawn as circles). Arcs represent service requests, either synchronous (when the client makes the request, it blocks until it receives a response from the server) or asynchronous. Tasks can be either reference tasks (pure clients, only outgoing arcs), pure servers (leaf nodes, only incoming arcs) or an intermediate form with both incoming and outgoing arcs (server to some tasks, client to others). Tasks can provide more than one type of service (e.g. a database can provide support for both searching and updating). This is represented by dividing the task-parallelogram into several smaller parallelograms (called entries), each representing one type of service. Request-arcs are then drawn to the sub-parallelograms. Every task has at least one entry. Every entry can be divided into one or more phases. When a request arrives at the task (and the task can serve the request), the first phase is executed. After the first phase, the response is sent to the client, so the client is unblocked and can continue its work. All other phases (usually only a second phase, if more than one phase is specified) are executed after the response is sent, so they model some sort of post-processing of the request. All entries have their own execution times (specified per phase) and can make calls to other entries. The number of calls made can be fixed or can vary with a given mean and standard deviation.


Servers and hardware devices always have a single infinite queue in which arriving requests wait until they are served. Servers (and hardware devices) can be singleservers or multi-servers, indicating the number of requests that can be served concurrently. Several scheduling algorithms are possible, e.g. first-come first-served scheduling, priority-based scheduling, processor sharing (only for processors), etc. Figure 2 presents a small example of an LQN model of a database server. One or more concurrent clients are running on the same processor a. Each client is split into 2 separate tasks. The database server (running on processor b) is divided into 2 entries, modelling a database search and an update respectively. The client first performs some initial calculations and searches the database (modelled by the task search). After some further work (e.g. printing the results of the search), it calls the database again to perform an update (as modelled by task update). The database is a single-server, indicating that a client has to obtain a lock to perform searches or updates. a

search search

database

we will consistently use these names). Figure 3 gives an overview of the CORBA-architecture. The ORB-interface is used to communicate with the ORB (mainly during initialization). When a client wants to make a request to a server, it sends its request to the stub. The stub will pass the request on to the ORB, which will make sure it arrives at the (remote) server. When the request arrives at the server ORB, the ORB will deliver the request to the skeleton (the server-side equivalent of the stub). The skeleton then forwards the request to the server, which handles the request. The answer then follows the same path in the opposite direction, back to the client. Like this, the client gets the impression that the server resides on the same computer as itself (providing some form of location transparency). The stub and skeleton will perform some operations on the request and the response (marshalling and unmarshalling), to transform the data (e.g. parameter values) from the native format to a language independent “wire” format and back. This allows cooperation between clients and servers implemented in different programming languages and running on various platforms.

update update

b

Figure 1. An LQN model example

client

ORB interface

server

stub

skeleton

client ORB

ORB interface

server ORB network

A number of simulation tools and analytical solvers exist to obtain performance estimates from an LQN model, including the execution times of an entry (or a phase of an entry), the mean waiting time for a given call, processor occupation, throughput of a certain task, etc.

3. CORBA The CORBA-standard as specified in [10] is an open standard and does not specify implementation details. Examples of CORBA implementations are ORBacus [6], JacORB, OmniORB [4], etc. The implementation we used is ORBacus. The main part of every CORBA implementation is the Object Request Broker (ORB). It provides a way to send calls from a client object to a server object, irrespective of their architecture, programming language, etc. To accomplish this, it uses a system of stubs and skeletons (sometimes different names are given to stubs and skeletons, but

Figure 2. An overview of CORBA

Of course, in order to send requests to the server, the client must somehow get hold of a reference to the server (indicating for example its location in the network and the port it is listening on). One mechanism to achieve this (and the one used in the research performed for this paper) is using the Naming Service (NS). It binds canonical names to object references and can be questioned by a client to obtain a reference to a server-object. Looking at this description of CORBA, it is clear that there are some points that will influence the performance of the system. The marshalling and unmarshalling of requests and responses in the stub and the skeleton will incur some overhead. The NS could be a bottleneck when many clients want to obtain server-references simultaneously. These aspects should be modelled in order to obtain realistic performance indications for CORBA-based distributed systems.


4. Model initialization and validation A simple client/server system was built to serve as a test-case for obtaining the inputs to our performance model (execution times for relevant components). The client performs some initialization, obtains a reference to the server from the NS, makes a request to the server, waits for the response and shuts down (after some clean-up like destroying the ORB). The server and the NS both run on a dedicated computer. One client (on which all the measurements were done) runs on a third computer. A fourth computer is used as a load generator, running all the other clients (or rather a somewhat optimized equivalent of those clients). The computers are connected via a fast ethernet LAN.

5. LQN model of a client/server architecture using ORBacus The LQN model of the test-case presented above is given in figure 5. To enhance the readability, we used the nonstandard notation of drawing all arcs from tasks to processors as dashed lines (dashed arcs usually represent call forwarding).

A

initialize

corba client

resolve

client

destroy

NS stub

C

network

load generator

network

client A

D

B

skeleton

server

NS

sever

Figure 3. The test setup The application used was a distributed version of the well-known “Hello world” application, a very simple program with a very small execution time. As a result, the overhead incurred by the middleware plays a very important role in the overall execution times. Thus, inaccurate estimates due to modelling errors are not accidentally hidden by large server execution times. This makes it easier to fine-tune and correctly validate the model. The server and the client were implemented in C++. All components used ORBacus 4.1.0 as a CORBA implementation. The measurements were performed by instrumenting the source code of the client, the server, the NS and the stub and skeleton. Response times and execution times, measured with only 1 client running, were used as inputs to the performance model. Validation of the model was then done by measuring the response times for a growing number of concurrent clients and comparing them to the predicted values, obtained by simulating the model with the same number of clients. The results are presented in section 6.

B

C

Figure 4. The LQN model The tasks client and server represent the useful work that is being done. The other tasks on the client side and the server side model the different forms of overhead incurred by using ORBacus. NS is the performance model for the Naming Service. The task corba client is nothing more than an extra reference task with no execution time, calling the different parts of the model. The network is divided into multiple entries. This models the fact that e.g. the call messages and the responses have different sizes and thus take a different amount of time to travel through the network. The client side can largely be divided into 4 parts. First, there is an initialization phase, initializing the ORB, getting a reference to the Naming Service, etc. Next, the Naming Service is used to find a reference to the server object. Both the request to and the answer from the NS are sent through the network. Once a reference to the server is obtained, the real (useful) work can start. The useful work on the client


side is modelled by the task client. Its execution time (and possible calls to other tasks) is totally dependent on the application at hand. It can range from a thin client, performing close to no tasks, to a major bottleneck of the system. Afterwards, some clean-up operations (like destroying the ORB, freeing resources) are done. The client task sends a request to the server, after it has performed the necessary client side calculations. The mechanism is as explained in section 3. The client sends its request to the stub. The stub performs some marshalling of the arguments and forwards the call to the server, over the network. At the server side, the call is received by the skeleton, which unmarshals the arguments and forwards the request to the server, where it can be handled. After the request is handled by the server, the response is returned to the client, again using the network. Note the the marshalling and unmarshalling of the response is not modelled separately. It is implicitly modelled by adding the time of marshalling the request and unmarshalling the response and taking that as the execution time of the marshalling part of the stub. The same thing can be done for the skeleton. This does not change the performance results obtained from the model, because the exact moment of execution does not affect the average execution time of the task (although it could off course affect the execution times of the individual runs of a single component). Another thing worth noting is that there is no separate process for the ORB. This is not necessary, since every stub has its own ORB. Therefore, combining the overhead incurred by the ORB and that incurred by the stub, yields the same results as dividing it into two separate tasks. Very important is the fact that the Naming Service, the skeleton and the server are modelled as multi-servers. This means that they can serve multiple requests concurrently. Because they run on single-processor systems, at every moment in time, the handling of only a single request will be executing, but the executions of multiple requests will be interleaved and when the handling of one request gets blocked (e.g. because the skeleton is awaiting the response from the server), another request can be handled. This will not influence the throughput of the system (requests can only be handled at the speed of the single processor), but it has a big influence on the average response time. This can be seen very easily. When, say, 3 requests arrive simultaneously, handling them FCFS (as would be done by a single-server) would mean the response time of the first request would be the handling time, that of the second request would be twice the handling time and that of the third three times the handling time. The average response time is 2 “handling times”. When the requests are handled by a multi-server, their handling would be interleaved, handling part of the first request, then part of the second, etc. The result would be that the requests have an almost equal response time of

about three times the handling time. The throughput is the same as in the single-server case, but the average response time is higher. In order not to overcomplicate figure 5, we left out some minor interactions. In fact, when a reference to the server is obtained through the Naming Service, an initial request is immediately sent to the server when casting the reference received from the NS to the proper type (some sort of “confirmation request”, asking the server whether he really is the type of server he appears to be). The same is true when obtaining a reference to the Naming Service. These calls are not vital to the model, since they require only a very small amount of processing time. They could play an important role when some of the resources (the NS, the server or the network) get heavily loaded, but we did no measurements under such conditions.

6. Model results Figure 6 shows the response time of the naming service when asked for a reference to the server as a function of the total number of clients. The model seems to be very accurate up to 60 simultaneous clients. The difference between the estimated response times and the measured times does not exceed 3% of the measurements. The reason for the much larger estimation errors with more than 60 clients is the fact that the load generator starts to get overloaded. The load generator has a separate process for each “client” it simulates. Even though the load generating processes sleep most of the time, when a large number are running concurrently, they start to contend for the processor. As a result of this, the load generator cannot sustain the linear increase in request frequency. Instead, the request frequency raises more slowly and eventually even decreases with a growing number of simulated clients. Because the request frequency (and thus the load on the Naming Service and the server) decreases, the response time also decreases. This happens a bit earlier in the real system than in the model, because process switching overhead was not included in the model of the load generator. That accounts for the larger deviation for more than 60 clients. When even more clients are simulated, the estimated response time also starts to drop. A very similar behavior can be observed when studying the response time of the server (see figure 6). More accurately, the graph shows the response time of the call made by the client to the stub, so it still includes some execution on the client side. The estimation errors are less than 2% of the measured response time. As noted earlier in section 4, the largest part of the measured response time is due to overhead (more than 75% of the response time is spent in the stub, only about 3% is actually spent in the server, performing the desired function).


obtain server reference 3 2,5

time (ms)

2 1,5 model measurements

1 0,5 0

1 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 number of clients

Figure 5. Delay to obtain a reference to the server

server response time

8. Acknowledgement

2,6 2,5

time (ms)

2,4 2,3 model 2,2

Experiments show that the proposed model produces very accurate estimates of the response time, both when making calls to the Naming service (less than 3% estimation errors with up to 60 concurrent clients) and to the server (2% errors up to 100 clients). Similarly accurate estimates can prove very valuable at the early design stages, because they allow to quickly spot performance problems and solve them when the cost of redesigning the application is still small. Even more so because the modelling process is fast and fairly easy. The accuracy of our analytical model allows us to add performance information to architectural patterns and to take performance engineering into account in the design phase of the system. Future research will design similar models of other middleware architectures, like JAVA RMI, JINI, adaptive middleware technologies, etc. Also the extension to other architectural patterns (e.g. blackboard, peer to peer) will be researched.

measurements

2,1

I would like to acknowledge the Institute for the Promotion of Innovation through Science and Technology in Flanders (IWT-Vlaanderen) for the doctoral fellowship supporting this research. I would also like to thank Murray Woodside and all of the Real-Time and Distributed Systems Group at Carleton University in Ottawa, Canada for their support on LQN modelling and for providing the tools to simulate and solve the LQN models.

2 1 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 number of clients

Figure 6. Response time This means that the estimates of the overhead are as good as the estimates of the total response time. The execution times of the other tasks contain no interesting information. Those tasks make no use of the naming service, the server or any other resource other than the client processor. Since the client resides on its own dedicated processor, the execution times of the other components show no variation as a function of the numbers of clients.

7. Conclusions This paper presents a performance model for client/server systems using CORBA as a middleware. The performance modelling formalism used, was the Layered Queueing Network model.

References [1] M. Bernardo, L. Donatiello, and R. Gorrieri. Modeling and analyzing concurrent systems with MPA. In Proc. of the 2nd Process Algebra and Performance Modeling Workshop, July 1994. [2] G. Ciardo, J. Muppala, and K. Trivedi. SPNP: Stochastic Petri net package. In PNPM89. Proc. of the Third International Workshop On Petri Nets and Performance Models, pages 142–151. IEEE Computer Society Press, 1990. [3] G. Franks, A. Hubbard, S. Majumdar, J. Neilson, C. Patriu, J. Rolia, and M. Woodside. A toolset for performance engineering and software design of client-server systems. Performance Evaluation, 24(1–2):117–135, February 1995. [4] D. Grisby, S.-L. Lo, and D. Riddoch. The omniORB version 4.0 Users Guide. 2002. [5] P. G. Gu and D. C. Petriu. XSLT transformation from UML models to LQN performance models. In Proc. of 3rd Int. Workshop on Software and Performance WOSP’2002, pages 227–234. ACM Press, July 2002. [6] Iona Technologies. ORBacus for C++ and Java, version 4.1.0. Dublin, Ireland, 2001.


[7] P. Kähkipuro. Performance Modeling Framework for CORBA Based Distributed Systems. Ph.D thesis, 2000. [8] E. D. Lazowska, J. Zahorjan, G. S. Graham, and K. C. Sevcik. Quantitative System Performance, Computer System Analysis Using Queueing Network Models. Prentice-Hall, Englewood Cliffs, New Jersey, 1984. [9] M. A. Marsan, C. Gianni, and B. Gianfranco. A class of generalized stochastic Petri Nets for the performance evaluation of multiprocessor systems. ACM Transactions on Computer Systems, 2(2):93–122, May 1984. [10] Object Management Group. The Common Object Request Broker: Architecture and Specification. Needham, MA, USA, 2002. [11] D. Petriu, H. Amer, S. Majumdar, and I. Abdul-Fatah. Using analytic models for predicting middleware performance. In Proc. of the second international Workshop on Software and Performance, pages 189–194, September 2000. [12] C. U. Smith. Designing high-performance distributed applications using software performance engineering: A tutorial. In Proc. of the Computer Management Group, December 1996. [13] C. U. Smith and L. G. Williams. Performance engineering models of CORBA-based distributed-object systems. In Proc. of the CMG Conference, pages 886–898, December 1998. [14] P. T˚uma and A. Buble. Overview of the CORBA performance. In Proceedings of the 2002 EurOpen.CZ Conference, September 2002. [15] M. Woodside. Tutorial Introduction to Layered Modeling of Software Performance. Ottawa, Canada, 2002.


Modelling the performance of CORBA using layered queueing networks

Modelling the performance of CORBA using layered queueing networks

Suggest Documents

Layered Performance Modelling of a CORBA-based Distributed ...

Layered Performance Modelling of a CORBA-based ... - CiteSeerX

Simulating Layered Queueing Networks with ... - Semantic Scholar

fundamentals of queueing networks: performance ... - Hindawi

Solving Layered Queueing Networks of Large Client-Server Systems ...

Queueing Networks

Performance modelling of hierarchical cellular networks using PEPA

applying bcmp multi-class queueing networks for the performance ...

Performance Analysis of Queueing Networks via Robust Optimization

a Layered Queueing Network Approach - CiteSeerX

Modelling and Performance Analysis of Cache Networks

Improving the Performance of IEEE802.11s Networks using ...

Performance Analysis of Supply Chains using Queueing Models

TREELIKE QUEUEING NETWORKS - Project Euclid

Layered Complex Networks

Estimating Traffic Intensity at Toll Gates Using Queueing Networks

Approximate Solution of Queueing Networks with ... - CiteSeerX

Transient Behavior of Queueing Networks - Semantic Scholar

Performance Comparison between Queueing ...

Deriving Distribution of Thread Service Time in Layered Queueing ...

Adaptive importance sampling simulation of queueing networks ...

Approximate Solution of Queueing Networks with ... - CiteSeerX

Performance Evaluation of Two Layered Mobility Management using

An Overview of the JMT Queueing Network Simulator - Java Modelling ...