With advances in distributed platforms, distributed application systems on ... Once the design is established and the system's development begins, perfor-.
Layered Performance Modelling of a CORBA-based Distributed Application Design Fahim Sheikh+, Jerome Rolia+, Pankaj Garg*, Svend Frolund*, Allan Shepherd* {sheikhf,jar}@sce.carleton.ca, {garg, frolund, shepherd}@hpl.hp.com + Carleton University, Ottawa, CANADA *
Hewlett-Packard Laboratories, Palo Alto, California, USA Abstract
With advances in distributed platforms, distributed application systems on heterogeneous platforms have become a reality. These systems are complex, with software and hardware components that are distributed geographically, sometimes globally, yet they combine to provide a single processing model. Deciding how to distribute software and hardware resources for these systems is a major challenge and is influenced by factors such as security, reliability, manageability, and performance. In this paper, we focus on the performance behaviour of a design for a CORBA-based Distributed Network Management Application. Once the design is established and the system’s development begins, performance problems resulting from a poor design can be difficult and expensive to fix. We built both simulation and analytic models, and used the models to help recognize such problems early. We discuss the proposed system architecture, simulation and analytic models for the application, and extensions to the analytic models needed to support features present in the design. KEYWORDS: analytic modelling, layered queueing modelling, distributed applications, scalability analysis
1.0
Introduction Many organizations rely a great deal on their continuously growing computing infrastructures. Network management systems help to keep an inventory of the devices and servers that are part of the system and to record and make available status information about them. The information can be used to help recognize and locate the sources of problems. As the number of resources being managed grows and the systems become more distributed, current network management technologies become inadequate. The need for a greater number of network administrators increases — adding to the cost of management. A group at Hewlett Packard has proposed a design1 for a distributed network management application
1
that is to overcome such limitations. It relies on the CORBA distributed platform and aims to support global organizations with many interconnected network management domains, fewer operators, and very large number of managed resources per domain. One aspect of our study has been to discover design alternatives that enable the final system’s ability to support its intended target environments. Since the design under study is complex and has not yet been implemented, the study had to be conducted using either simulation or analysis. In [2], it was shown that simulation can be used to investigate the performance and scalability of distributed applications. Simulation is viewed as a more intuitive, expressive and reliable approach. Analytic models can provide predictions quickly, thereby enabling a large number of experiments to be conducted on the models. Furthermore, once the system has been deployed, an analytic model can be carried onwards to help support the on-line performance management and trend analysis of the distributed application. Simulation may be too slow for on-line management. For a designer, being able to the investigate different design alternatives easily and efficiently is a necessity. Often designers intuitively know that certain solutions can have performance implications but they do not have the tools to explore them. In this study there were over a 100 parameters for the design model. Since many parameters are estimates, the predicted performance numbers from the simulation or analytic model are not interpreted literally. What is more important is the ability to support “what if” analysis and determine whether the impact of a proposed design change was positive or negative. Smith [23] presents performance engineering methods for describing software transactions in submodels. Our approach is based on these methods. These submodels can then be integrated to create system level models for simulation or analysis. Analytic models are an attractive alternative to simulation, particularly when many different designs needed to be evaluated. Traditional approaches to analytic modelling include product-form queueing for performance, and Markov models and Markov reward models for reliability and performance. In [18], Ibe, Choi, and Trivedi used Stochastic Petri-nets to generate Markov models and evaluate the performance of a client-server system. However due to state space explosion, we do not believe their approach is feasible for the system presented in this paper. Queueing Network Models on the other hand were used by Jenq, Kohler and Tows-
1. Disclaimer: Although we are presenting model and analysis results of this design for academic and scientific reasons, these results should not be construed as bearing any relationship to any known or planned HP product.
2
ley [20] to model a distributed database system. Contention for software components were reflected in the models using formulas that are difficult to generalize. In [17], it was shown that a corresponding layered model more closely resembles the system’s architecture and provided similar results. An advantage of simulation and Markov based models over queueing network based techniques is their ability to capture synchronization behaviour more naturally. In this paper we focus on whether analytic techniques could be used for this complex problem. Layered Queueing Models (LQM) [5][15] have been proposed for such systems. LQMs are extended QNMs that consider contention for both software and hardware resources. However these techniques have not been validated for such a large and complex system. To decide, both simulation and analytic models were developed. The simulation model was described and evaluated using SES workbench, and the analytic model was developed using LQMs and the Method of Layers (MOL). Using our experience with the models we address the following questions: • • • •
Are LQMs and the MOL robust enough to describe and predict the behaviour of a complex distributed application? Are the same conclusions drawn using the analytic model as the simulation model? How efficient is the MOL (solution time) compared with simulation? Is analytic model easier to work with?
To describe the model for this application several new features had to be added to LQMs. These included: • • •
Automated calculation of midware and networking overheads Asynchronous messaging and join delays A notation to express hierarchical symmetries in the system description that enabled: • A compact model description • Efficient solutions even for models of very large systems
We built models for a proposed distributed management application design and compared the performance estimates of the analytic and simulation models for a number of system configurations. Section 2 gives an overview of a distributed management application design. It also identifies the different types of software interactions present in the proposed system. The simulation modelling approach is given in section 3. Section 4 describes LQMs, the new
3
features, and the corresponding analysis. The resulting LQM is validated with respect to the simulation model Section 5. Our conclusions are given in Section 6.
2.0
Case Study: A Distributed Network Management Application Consider the networks of a large geographically distributed enterprises. Each network interconnects a number of devices such as computers, printers, routers, and so on. Since these infrastructures are becoming increasingly critical for the operation of enterprises, it is important that they are managed efficiently and cost effectively. Management involves monitoring the operation of the network in order to detect device failures, determine device loads, and detect link failures. Management also involves taking corrective actions when failures and overloads occur. Current network management applications allow a number of operators to monitor and manipulate a network of devices. The proposed design supports better scalability, and thereby the management of more devices by fewer operators. Though confidentiality agreements prohibit us from disclosing details of the design, key features relevant to this study are explained by example in this section. Figure 1 illustrates the architecture of the proposed distributed management application. The application consists of a number of processes that each run on nodes in the distributed environment. A process has a single address space, analogous to a UNIX or NT process. Processes communicate by means of various inter-process communication mechanisms such as Remote Procedure Calls (RPCs) or Sockets. The communication between two processes that are running on different computers is mediated by a communications network. There are two kinds of processes expressed in the design. Client processes present a graphical user interface to network operators. Layers of Server processes contain the actual management functionality. Operators issue requests to a graphical user interface. In response to these operator requests, the Client processes communicate with the server processes which perform the actual management tasks. The distributed management application distinguishes among different types of hosts depending on whether the hosts are running client or server processes. A deployed distributed management application supports very large numbers of devices and management domains by increasing the numbers of each of these types of nodes. More operators mean more operator nodes, and more devices require more device Server nodes. Larger management domains result in an increase in all of these and this results in an increase in the load on a shared wide area network. Thus scalability is affected by making appropriate choices for the numbers of these nodes and domains. Performance evaluation helps to better
4
understand the relationships between the relative numbers of each of these nodes and domains needed to support various target environments. Managed Domains Client Hosts
Operators
Client Processes CPU LANs Server Processes
Server Host
Server Host
WAN
WAN backbone
Figure 1 A model of a distributed network management application
An operator request may result in activities at multiple processes. A transaction is the total (distributed) activity created in response to an operator request. The starting point of a transaction is an operator request, and the end point of a transaction is presenting the result of the request to the user. A Use Case Map [6], in Figure 2, is used to illustrate the software interactions appearing in the system’s transactions. The actual transactions are more complex in nature as they involve more relationships of the kind shown in Figure 2 and spanned multiple management domains. In Figure 2,
indicates the start of the transaction (initial stimulus) and
indicates
the end of the transaction or the end of any other thread of control that was started as a result of the transaction (i.e. 3 in Figure 2). During the course of a transaction, we observe both synchronous and asynchronous message passing between processes 2 , 3
. We also see the “forking” and “joining” of parallel execution paths 4 .
Three transactions were chosen for the model. They are identified as: T1, T2, T3. These transactions were selected in consultation with end-users, user-interface design teams, and by considering the total performance impact of the various transactions. Other transactions are modelled as environment processes that compete for system resources such as CPU and disk.
5
Client Host
Server Host
1
Terminals
3
1
three different transactions types
2
synchronous RPC’s
3
asynchronous messages
4
fork-join behaviour
start
2
4
finish fork
asynchronous activity
join
5
Server Host Figure 2 Transaction Use Case Map to illustrate the software interaction in a Distributed Management Application
The notion of a transaction captures an end-to-end perspective on the performance of distributed applications. We use the duration, or end-to-end mean response time of transactions as a key performance metric for evaluating the design. This performance metric focuses on application performance as experienced by the users of the application. We are concerned with how the response time increases as the number of devices managed in each domain and the number of domains increases. Device utilizations were also captured to determine how much capacity is needed to support various configurations.
3.0
Damson: A Simulation Model for Distributed Management Applications In order to systematically and accurately predict the end-to-end response time of transactions, we built a simulation model of the proposed design. Since response times were the main concern, the model only describes those features of the design that consume computing resources. Discrete event driven simulation is used to predict queueing delays for these resources and the overall transaction response times. The following physical resources were represented in the model: •
Disk
6
• • •
Memory CPU Network bandwidth
The sequence of resources used by each transaction was encoded in an SES workbench model. Each step in the transaction consumes a specified amount of physical resource or choses between alternative paths. During the simulation, queueing delays arise if the modelled resource is already busy. In addition to the above physical resources, we also take into account contention for logical resources that are present in distributed applications systems. These include processes and shared data structures. For example transactions contend for access to multi-threaded servers, and threads inside the server compete for locks to shared data structures. These resources are also represented as queues and access to them is encoded in the SES model transactions. The following logical resources were included in the model: • •
Thread pools for multi-threaded servers Critical sections protected by mutual exclusion
The simulation model was instrumented to measure the mean, and standard deviation of the end-to-end simulated response times of transactions. Figure 4 illustrates the overall structure of the simulation model. The model has four views: • • • •
Application: to describe the transactions’ flow of control, their use of processes, and IPC mechanisms System: to describe system level hardware and software Environmental: to describe other work that will also share the system and application resources Configuration: maps the application and environmental components onto system components
For example, the configuration model describes which computer a given application process runs on. The application model describes the entities that generate work or influence the amount of work for the system configuration under consideration. By structuring the simulation model according to these views, we simplified the task of reflecting the changes needed to support large numbers of devices in the model.
7
WORKLOAD MODEL MANAGED DEVICES
OPERATORS
APPLICATION MODEL
ENVIRONMENT MODEL
Communication
Processes
SYSTEM MODEL
Network
CONFIGURATION MODEL
Figure 3 The overall structure of the Network Management Application Simulation Model
In the case of the distributed management application, work is generated asynchronously by operators as T1, T2, and T3 requests. The number of devices being managed affects the branching probabilities and loop counts as encoded in the transactions and therefore affects the amount of resources consumed by transactions and the relationships between the number of nodes and management domains.
4.0
Layered Queueing Models and the Method of Layers In this section, we describe QNMs, LQMs and include basic MVA equations. These equations are modified in subsequent sections of the paper to describe the analysis needed to model features present in the distributed management application. Queueing network models (QNM) [7] are an abstraction for computer systems that have been in use since the early 1970’s. QNMs include a description of the service centres of a system and the customers that use them. There are several analytic performance evaluation techniques that can be used to predict their behaviour. Common techniques for predicting the performance measures of QNMs are exact mean value analysis (MVA) [7],[13],[12] and approximate MVA[21],[8]. MVA estimates the average queue lengths of customers at servers and mean response times of customers. Queueing network models are often used
8
to study the performance behaviour of centralized systems such as mainframes and other centralized servers. They are appropriate for describing contention to the physical resources (CPU, Memory, Disk, Network bandwidth) listed in Section 2, but do not capture the performance impact of access to logical resources (thread pools, critical sections) that are also present in the system we study. LQMs are QNMs extended to reflect interactions between client and server processes. The processes may share devices, and server processes may also request services, by RPC, from other server processes. This layering leads to the name of the LQM and makes the model appropriate for describing distributed application systems such as CORBA, DCE, OLE and DCOM applications. In these applications a process can suffer queueing delays both at its node’s devices and at its software servers. If these software delays are ignored, response time and utilization estimates for the system will be optimistic. The Method of Layers (MOL) [14], [5] and Stochastic Rendezvous Network (SRVN) [1] techniques have been proposed as performance evaluation techniques that estimate the performance behaviour of LQMs. Both evaluation techniques are based on approximate MVA. The MOL is an iterative technique that decomposes an LQM into a series of QNMs. Performance estimates for each of the QNMs is found and used as input parameters to other QNMs. The purpose of the MOL is to find a fixed point where the predicted values for mean process response times and utilizations are consistent with respect to all of the submodels. At that point, the results of the MVA calculations approximate the performance measures for the system under consideration. Intuitively, this is the point at which predicted process idle times and utilizations are balanced so that each process in the model has the same throughput whether it is considered as a customer in a QNM or as a server (the rate of completions of the server equals the rate of requests for its service), and the average service time required by callers of a process equals its average response time. In LQM’s for a workload class
c
is used to represent each type of process on a node. The
following are the parameters for a LQM: • • •
process classes and their populations devices and their scheduling policies for each service s of each process class c • V c, s, k is the average number of visits to each device k • S c, s, k is the average service time per request at each device k • V c, s, d, s2 is the average number of visits to each service s2 of each other server process
d
9
USER GUI SERVER A SERVER B SERVER C
computation
locked
blocked file access
middleware cost database cost ipc cost
Figure 4 Transaction execution sequence diagram of a part of a transaction
We refer to the visit and service time information for each service of each class as a visit ratio specification. Demands are computed as: D c, s, k = V c, s, k S c, s, k
The population of a process in a LQM is used to specify its threading level. Note that the service times at the methods of serving processes are not specified. These values must be estimated by performance evaluation techniques. We also refer to process classes as entities and services as entries in the LQM model. The results from the MOL for LQMs include: •
for each device k : • U k is the total utilization of the device • Qk is the total average queue length at the device
•
for each process class c : • U k is the utilization of each device k • X c is the throughput of the class • Rc is the average response time of customers in the class • Rc, s is the average response time of each service s • U c is the total utilization of this process class and its services U c, s • Qc is the total average queue length of this process and its methods Qc, s • Qc, s1, d, s2 is the average queueing delay of each method s1 at each interface •
method s2 of its serving process d U c, d is its utilization of each serving process
10
entities USER synchronous RPC
Client Host entry service times
[1,0] [10,0]
GUI
entries
avg. # of visits
[1,0] [18,0]
[18,0]
SERVER A SERVER A Services Host
[1,0] [1,0]
[5,0]
[12,0]
[9.5,0]
SERVER B SERVER B
SERVER A LOCK
[12.5,0] asynchronous call
[5.5,0]
[5.5,0]
SERVER C
Managed Domain
Object Host
Figure 5 Layered Queueing Model created from Figure 4
•
is the average residence time at each device delay, the residence time includes all visits R c, k
k
including queueing
The LQM model derived from the transaction execution sequence diagram in Figure 4 is shown in Figure 5. Requests are made to “entries” which are services provided by that process. Each entry has a characteristic service demand and pattern of requests to other servers. For example in Figure 5, the entry in the GUI process has a service demand of 10 ms and it makes requests to entries on SERVER A processes and other server processes in other managed domains. The mean number of requests to other server processes are labelled on the arc between the entries. In the figure, requests are either synchronous or asynchronous and is indicated using solid and dashed lines, respectively. Communication between hosts must traverse networks causing communication overhead.
11
Performance evaluation techniques for QNMs [12] treat each queue separately and integrate the results to determine mean performance measures of the network as a whole. For example exact MVA [13] is based on three equations. We remove the service subscript s to simplify the equations: Residence time expression: R c, k ( N ) = D c, k ( 1 + Q c, k ( N – 1 c ) )
Throughput calculation: Nc X c ( N ) = ------------------R c, k + Z
Little’s Law: Q c, k ( N ) = X c, k ( N )R c, k ( N )
where
N
is a vector describing the population of each customer class
Nc
in the QNM.
The algorithm starts with zero customers and adds one customer at a time until the full customer population is reached for each class. Approximate MVA techniques start at full customer population levels and use heuristics with lower customer population levels to estimate the results of exact MVA. Residence time expressions have been developed to reflect several types of software interactions in the models including: synchronous RPC and rendezvous, multi-threaded servers, and limited synchronization behaviour[14]. Further extensions to the model were required to develop an appropriate model for the distributed management application system in this paper. These are described in the following section. 4.1
Extensions to the LQM To model the design several extensions were made to the LQMs and MOL. Some of the extensions simplified model description and made it easier to consider changes to the design, others were introduced to represent software interactions in the models and required changes to the analytic techniques. The new features that were added include: • • •
automatic calculation of midware and network overheads residence time expressions for asynchronous RPC and join behaviour support for hierarchical areas of model components
12
The first has been added for modelling convenience, the second to better capture interactions with databases, and the last to support scalability analysis. These features are described in the following subsections. 4.1.1
Midware and Network overheads Midware and network overheads are now reflected in the model by computing and adding service time demands to CPUs and networking elements, respectively. The demands are computed using a linear model that includes a fixed CPU cost or network latency per message sent and received and an extra overhead time per byte sent and received. To compute these costs, the number of bytes sent and received are now parameters for each RPC. Tables of costs could have been used [10] to capture overheads more accurately but this was not done for this study. Linear models can be specified for different midwares and networks. In Figure 6, a process named CLIENT interacts with a process named SERVER Execution Sequence Diagram
High-level Process Model
communication processing costs
CLIENT
CLIENT SERVER Comm. Delay Server
Node A Network
(b)
SERVER Node B
Layered Queueing Model CLIENT
(a) [2,0]
[1,0]
Communication Delay Server
SERVER (c)
NOTE: Communication processing costs and network delay are a function of network bandwidth, network latency, and bytes transferred CPU
CPU
Figure 6 Layered Queueing modeling of network communication delay
using a synchronous remote procedure call (RPC). Figure 6(a) shows the interaction between a client and a server process that passes over a network. Figure 6(b) illustrates an execution sequence diagram which was found to be very helpful in generating the LQM. Figure 6(c) shows the corresponding LQM with a visits to the network delay centre and extra CPU demand for midware overhead. This approach was appropriate for our system because of its small message sizes. 4.1.2
Asynchronous messaging and Join Delays An example of an asynchronous RPC is illustrated in Figure 2. It differs from a synchronous RPC in that the client does not block waiting for the result of the request. Furthermore, a client’s asynchronous request may face contention from its previous asynchronous
13
requests. To reflect this behaviour in the MVA equations we compute the request’s residence time with a modified arrival instant queue length calculation and do not include the residence time in the client’s throughput calculation. The arrival instant queue length is set to the mean queue length at the current customer population. To summarize: •
R c, k ( N ) = D c, k ( 1 + Q c, k ( N ) )
•
Nc X c, k ( N ) = ---------------------------------------sync K
∑ R c, k ( N ) + Z
k=1
Where
k = 1
to
K
sync
includes only the residence times for synchronous RPCs. Note that
choosing the arrival instant queue length to be
Q( N )
treats the client’s load as an open class
[4] load at the server. This is similar to the open workload assumption made by Heidelberger and Trivedi [19] when modelling fork-join behaviour. With our approach we capture the relationship between client requests and asynchronous requests by modifying residence time expressions instead of using alternating submodels. Asynchronous messaging is often accompanied by join delays. A transaction will fork several asynchronous requests, then wait for all to complete before continuing. Join delays are modelled by estimating the expected maximum of the mean response times of the corresponding asynchronous requests for service. The MOL estimates the mean response times and coefficients of variation of the asynchronous requests for service. These values are used along with the following well known formula to estimate the join delay: 1 maxDelay ( a, b ) = a + b – -----------a+b In this formula
a
and
b
represents two mean request response times. It assumes the
response times are exponentially distributed. Techniques are used to reflect the impact of hyper-exponentially distributed request response times [14]. Fast estimates for fork-join delays with hypo-exponentially distributed response times have been studied by others in [11]. When the join includes more than two response times, the expression is applied in a pairwise fashion. This gives a slightly pessimistic estimate for the expected maximum. To express a join delay in the model, a visit is made to a new type of server in the LQM. It is called a barrier server and it is a special delay server. The service time is specified by giving the names of the entries of the forked requests. The MOL uses intermediate calcu-
14
lations of the response times of the requests at these entries to compute the value of the join delay. Barrier servers are included in the device contention model and the demand at this centre is recomputed with each solution of the device contention model. In the distributed management application model, there were up to 50 requests (at 50,000 managed devices) participating in join delays. It was found that these delays had a significant impact on transaction response time in the system being studied. 4.1.3
Hierarchical areas The hierarchical grouping of areas is an important feature that is needed to study configurations of distributed application systems [16]. An area can represent sets of processes and devices that describe a server subsystem, a type of node in a network, a network of nodes, and so on. These are symmetries in an LQM and can be used to reduce the size of a model’s description. Given an area we can increase its number of replicates and its relationship to other areas. For example by changing a replication value we can increase the number of client nodes in a network, divide calls across replicates of a server, or increase the number of instances of local area network that share a wide area network. In this way the use of areas is representationaly efficient. The MVA residence time and throughput expressions have been modified to exploit this information so that fast performance estimates can be provided for even very large systems. In this way the approach we present is also computationally efficient. Figure 7 illustrates two examples of LQMs and their corresponding representations when simplified using the concept of areas. We extend the LQM with additional parameters for areas, by allocating each process, device, and networking element to a single area. Furthermore we list the parent of each area if it has one. An area can have at most one parent. The LQM now includes the following new parameters: •
A
• •
R Ai
the set of areas in the model
•
Ai
Ac P
the replication level of area i the area of process class, device, or network
c
the parent area of area i , this value may be empty
To support the relationships between areas we consider two modifications to the MVA equations. We modify the residence time and throughput expressions to take into account the replication values between areas. Essentially, each server now sees the replicates of all callers sharing its area but only sees its share of the visits to its area. The changes do not affect any of the MVA assumptions they simply provide for representional and computa-
15
A
Replication Factor = 3
A
A
A
CPU CPU
CPU
CPU
Replication Factor = 1
B
B
CPU CPU
(a) a
A
Replication Factor = 1
A Replication Factor = 1 Replication Factor = 2
bc
C
C
C
d
D
D
D
D
Replication Factor = 2
D CPU
CPU
(b) Figure 7 Layered models and their corresponding representation using the area notion
tional efficiency. We introduce the following scaling parameters to help characterize the relationships between a client c and server k pair: • • •
SP c ,k SR c ,k SV c ,k
Scale population factor Scale response factor Scale visit factor
The scale population factor
SP c ,k
is used to compute the number of clients seen by the
server. It is the number of replicates of the client’s area seen by the server multiplied by the clients population with its area. The scale response factor SRc ,k is used to compute the response times at all replicates of the server k . It is the product of the versed to reach the server k . The scale visit factor
SV c ,k
1.0 .
16
of the areas tra-
computes the client load across the
replicates of the server. If the load is divided across replicates then it has value
R
Ai
1 SV c ,k = ------------- , SR c, k
otherwise
As an example, for Figure 6 (b) we compute the following values for the area pairs. These values apply to all entities in the area pairs. Table 1: Scaling Parameters for Figure 6(b) Scale Population Factor
Scale Response Factor
Scale Visit Factor
R
SR a, c = A b A b = ( 1 ) ( 2 ) = 2
R R
SP a, d = A a = 1
R
SR a = A b A c A d = ( 1 ) ( 2 ) ( 2 ) = 4
SP c, d = 1
SR c, d = A d = 2
SP a, c = A a = 1
d
1 SV a, c = ------------- = 0.5 SR a, c
R R R
R
1 SV a, d = -------------- = 0.25 SR a, d 1 SV a, c = -------------- = 0.5 SR c, d
We now modify the MVA equations to take these values into account. To do this we make the population vector dependent on the client-server pair. The population of a client seen by the server becomes
N c, k = SP c, k N c .
The resulting equations are:
R c, k ( N ) = V c, k SV c, k S ( 1 + Q ( N c, k ) – 1 c ) N c V c, k SV c, k X c, k ( N ) = --------------------------------------------------Z + ∑ R c, k ( N )SR c, k Q c, k ( N c, k ) = X c, k ( N )R c, k ( N )SP c, k
In these set of equations multiply
V c, k
R c, k ( N )
by the visit factor
gives the residence time of the client at the server. We SV c, k
to reflect only the portion of client load directed at
one replicate of the server. The denominator of the throughput calculation takes into account visits to all replicates of the server. The numerator includes SV c, k to compute X c, k ( N )
for a single replicate of the server.
Q c, k ( N c, k )
computes the mean queue length of
one replicate of the client entity, then multiplies this by
SP c, k
to capture the queue length
based on all replicates of the client. These values are used in subsequent residence time calculations. We note that these calculations are implemented using Linearizer. Because we use approximate MVA, it is not necessary to start with an initial population of
5.0
O.
Validation and Scalability Analysis In this section we verify and validate the analytic model with respect to the simulation model. The resulting analytic model is then used to study the behaviour of a system configuration as the distributed management application is scaled upwards to support many management domains.
17
5.1
Verification and validation To verify the analytic model we configure models of a single node domain, a three node domain, and two interacting three node domains. The three node configurations are illustrated in Figure 8. Operator think times are set to typical values. An operator generates the three types of requests asynchronously. management domain
node
(a)
(b)
(c)
Figure 8 (a) one node (b) three nodes (c) two managed domains each with three nodes
We consider two types of test cases. Parameter screening is performed to verify that the analytic and simulation models are in step. It is applied to the single node case. The second set varies the number of managed objects to determine the resulting impact of queueing and synchronization delays for transaction response times. This is our validation step with respect to the simulation. The second set of tests is applied to the three node tests for both one and two domains. Parameter screening varies each of the more than 100 parameters of the design. They are set to 1/5 and 5 times their default values. The results show which parameters estimates affect our conclusions most. These parameters become the focus of more detailed studies including factorial designs to consider interactions between parameters. Extra effort is taken to ensure the accuracy the parameter estimates.
18
Table 2: Parameter screening with LOW parameter set Simulation
Analytic
Param # (T1)
%diff
Param # (T2)
%diff
Param # (T3)
%diff
Param # (T1)
%diff
Param # (T2)
%diff
Param # (T3)
%diff
13
-42.2
13
-29.0
37
-3.1
13
-46.9
13
-35.0
37
-3.0
12
-23.9
45
-21.7
39
-2.9
12
-27.0
45
-22.3
36
-3.0
25
-8.1
12
-17.8
36
-2.7
27
-7.7
12
-20.1
39
-3.0
54
-7.8
53
-17.7
3
-1.9
25
-7.7
46
-5.6
38
-1.3
7
-3.1
42
-8.3
44
-1.4
7
-4.0
41
-4.0
44
-1.3
3
-2.2
46
-6.5
38
-1.3
26
-2.0
7
-3.4
18
-0.7
28
-2.2
3
-5.3
2
-0.9
28
-2.0
2
-2.0
22
-0.7
26
2.0
43
-4.3
34
-0.9
2
-1.5
29
-1.4
34
-0.7
16
-1.9
7
-3.4
18
-0.8
11
-1.0
1
-1.0
21
-0.6
2
1.8
41
-3.3
21
-0.6
1
-0.8
35
-1.0
2
-0.5
Table 3: Parameter screening with HIGH parameter set Simulation
Analytic
Param # (T1)
%diff
Param # (T2)
%diff
Param # (T3)
%diff
Param # (T1)
%diff
Param # (T2)
%diff
Param # (T3)
%diff
13
2054.4
13
1420.4
23
72.5
13
1111.0
13
835.4
23
73.8
3
137.4
43
495.7
3
69.5
4
104.3
46
293.0
20
33.1
26
107.2
46
304.3
20
32.7
28
99.0
45
272.1
1
18.3
28
102.7
45
281.9
13
19.9
26
99.0
4
75.6
2
18.0
25
101.7
3
175.7
2
19.5
25
88.5
30
69.2
38
14.8
12
80.9
30
66.8
1
19.1
27
88.5
1
66.9
44
14.8
11
51.1
29
66.1
38
14.6
12
83.5
29
66.0
37
13.1
1
44.0
12
57.2
44
14.3
16
70.5
2
65.8
39
13.1
2
43.8
1
55.1
39
13.1
1
51.5
12
62.3
36
13.1
16
41.5
2
54.7
37
13.0
2
50.6
16
48.8
21
6.7
The results of the parameter screening are presented in Tables 2 and 3. The parameters are ranked based on the percentage difference of the resulting transaction response time from the default case. For each transaction type, the tables identify the 10 parameters causing the greatest impact. All of the simulation results that are shown in this paper have a 5% confidence interval with 95% confidence level. This should be kept in mind when interpreting small differences in magnitude as reported by simulation or slight differences in order of importance when comparing with analysis. From Table 2 we see that for low parameter settings, the analytic model is in step with the simulation model. The analytic evaluation identifies 3 of the 4 top T1 parameters and 4 of
19
the 6 top T2 parameters. None of the parameters affected T3 by more than our simulation confidence interval.1 Table 3 gives the results for the high parameter values. The analytic evaluation identifies 8 of the 10 parameters for the three transaction types. Next, we consider the validation tests. Our three node single domain and two domain models are configured with the default parameter settings. For these models the number of managed objects is increased from 5000 objects up to 50000 objects. This range gives acceptable response times for T1, T2, and T3. Simulation and analytic results are given in Figure 9 (a),(b) and (c) for the one and two domain cases. The estimated response times for the two approaches differ by at most 15% for all cases at the 50000 managed object level. There are several differences that affect the accuracy of the analytic model with respect to the simulation model. These are as follows: • • •
fork-join approximation error the simulation model simulates sequences of activities to accomplish each transaction, the analytic model considers mean behaviour the simulation model uses uniform, exponential, and deterministic distributions to describe random variables, the analytic model relies on exponential and geometric distributions
Though these are significant differences between the models, Figure 10 illustrates they are not features that are affecting performance most. 5.2
Scalability analysis In this subsection we consider a configuration of the management application with a varying number of operator nodes and two service nodes per domain. Each domain supports a fixed number of managed objects. Changing the replication values for the operator nodes and domain affects the performance of the domain and the level of traffic between domains. The domains are connected by a wide area network with a 20 millisecond latency. We consider how operators and domains can share the WAN for this configuration without causing more than a design goal of 20% WAN utilization for management overhead.
1.
Reviewers: the impact of several of the parameters {43,3,4,27,16, 53, 42} in the simulation and analytic models still require further study. A final interpretation will appear in the final version of the paper if accepted for publication.
20
Scalability Analysis for Three Node
10
10
Response Time (sec)
15
T2 5
T1
T2 5
T1 T3
0
0.5
1
1.5
2 2.5 3 Number of Managed Devices
3.5
4
T3 0
4.5
0.5
1
1.5
2 2.5 3 Number of Managed Devices
4
x 10
(a)
3.5
4
4.5 4
x 10
(b) Scalability Analysis for Two Managed Domains 15
Response Time (sec)
Response Time (sec)
Scalability Analysis for One Node 15
* +
10
simulation analytic
T2 5
T1 T3 0
0.5
1
1.5
2 2.5 3 Number of Managed Devices
3.5
4
4.5 4
x 10
(c) Figure 9 Scalability analysis results (a) one node domain (b) three node domain (c) two managed domains each with three nodes using the WAN
Figure 10 gives the results of the analysis. The WAN utilization is related to the product of the number of operators and domains. For example the WAN is able to support up 6 domains with 2 operators per domain without exceeding the design goal.
21
. WAN Utilization vs. Number of Managed Domains 0.2
5
0.18
4
0.16
3
WAN Utilization
0.14
2 0.12
1
0.1
increasing number of Operators
0.08 0.06 0.04 0.02 0 1
2
3
4 5 6 7 Number of Managed Domains
8
9
10
Figure 10 Analytic model results for WAN Utilization as the Number of Managed Domains are increased for increasing numbers of system Operators
6.0
Summary and Conclusions We have described the basic architecture of a proposed large-scale CORBA-based distributed application design. Both simulation and analytic models were developed for the system. The simulation model was implemented using SES workbench. The analytic model was based on Layered Queueing Models. The analytic model was validated with respect to the simulation model; similar conclusions were drawn when interpreting the results of the two approaches. Extensions to the LQMs and to the Method of Layers performance evaluation technique were needed for the study. The extensions included: • • •
automated computation of midware overheads and network delays support for asynchronous RPCs and join delays support for hierarchical areas
The new analysis has been presented in this paper and used to study the scalability of a configuration of the design with respect to wide area network utilization. In our study we found that the solution of each analytic model required approximately 5 seconds on a 150 MHz Pentium Pro system. The simulation time for models on an HP UX PA RISC workstation required between several minutes for smaller models to more than twenty minutes for the one and two domain cases with large numbers of managed objects. We expect the simulation to take longer still when more domains are considered. The analytic model provides much more opportunity for exploring design and configuration alter-
22
atives. The simulation model can provide more confidence when answering detailed questions. For many kinds of design and configuration questions, changing the analytic model required only few changes to parameters in the modelling interface. However some changes in behaviour required insight into the model and a fundamental understanding of the modelling approach. These statements are also true of the simulation approach. Many configuration changes could be caused by altering parameters to its modelling interface. Some changes in behaviour required changes to the code and transaction descriptions that implemented the model. Developing simulation and analytic models in parallel was a useful exercise. It forced us to scrutinize many of the analytic and simulation modelling assumptions. It has helped to increase confidence the model as well as the modelling approaches. A very difficult part of the study was to keep our models in step. The simulation model is a “living” model used to study the proposed system’s design. A shared representation for the system model would have been helpful. Future work includes applying such techniques to other systems, developing techniques to improve the consistency between models, and to study how the various modelling assumptions contribute to error in predicting the behaviour of these systems. Experience is needed to determine whether mean values are adequate for studying design issues in distributed application systems and for gaining a better understanding of scalability. Simulation provides the opportunity to capture (simulated) response time distribution information; work needs to be done to validate such measures with respect to measured systems. Intuition suggests that obtaining good estimates for higher moments and percentiles is a much more difficult modelling challenge than estimating means. In the future we hope that analytic models will aid in the real-time performance management of this class of systems.
7.0
References
[1] G. Franks, A. Hubbard, S. Majumdar, D. Petriu, J. Rolia, C. M. Woodside, A toolset for performance engineering and software design of client-server systems, Performance Evaluation, June 1995. [2] Svend Frolund, Pankaj Garg, and Allan Shepherd, On Predicting the Performance and Scalability of Distributed Applications, International Symposium on Software Engineering for the Next Generation, Japan, February 1996. [3] Raj Jain, The Art of Computer Systems Performance Analysis, John Wiley & Sons, 1991.
23
[4] MAP User’s Guide, Quantitative System Performance, Inc., Seattle, WA, 1982. [5] Jerome Rolia and Ken Sevcik, The Method of Layers, IEEE Transactions on Software Engineering, Vol. 21, No. 8, pp. 689-700, August 1995. [6] R. J. Buhr and R. S. Casselman, Use Case Maps for Object-Oriented Software Systems, Prentice-Hall, 1996. [7] J. P. Buzen, Computation Algorithms for Closed Queueing Networks with Exponential Servers, Vol. 16, pp. 527-531, September 1973. [8] K. M. Chandy and D. Neuse, A Heuristic Algorithm for Queueing Network Models for Computing Systems, CACM, pp. 126-133, February 1982. [9] K. M. Chandy, J. H. Howard Jr., and D. F. Towsley, Product Form and Local Balance in Queueing Networks, pp. 126-133, April 1977. [10] E. Pozzetti, V. Vetland, J. Rolia and G. Serazzi, Characterizing the Resource Demands of TCP/IP, B. Hertzberger, G. Serazzi Eds., Lecture Notes in Computer Science, Springer-Verlag, No. 919, pages 79-85, 1995. [11] C. M. Woodside, X-H Jiang, and A. Hubbard, A Fast Approximation for Mean Fork-Join Delays in Parallel Programs, Submitted for Publication, September 1996. [12] E. D. Lazowska, J. Zahorjan, G.S. Graham, and K.C. Sevcik, Quantitative System Performance: Computer System Analysis Using Queueing Network Models, PrenticeHall, Englewood Cliffs, N. J., 1984. [13] M. Reiser, Mean Value Analysis of Closed Multichain Queueing Networks, IBM Research Report RC 70 23, Yorktown Heights, N.Y., 1978. [14] J. A. Rolia, Software Performance Modelling, CSRI Technical Report, Ph.D. Dissertation, University of Toronto, Canada, January 1992. [15] C. M. Woodside, J. E. Neilson, D. C. Petriu and S. Majumdar, The Stochastic Rendez-Vous Network Model for Performance of Synchronous Client-Server-like Distributed Software, IEEE Transactions on Computer, vol. 44, no. 1, pp. 20-34, January 1995. [16] G. Hills, J. Rolia, G. Serazzi, Performance Engineering of Distributed Software Process Architectures, Lecture Notes in Computer Science. Springer-Verlag, No. 977, pages 79-85, 1995. [17] Fahim Sheikh and Murray Woodside, Layered Performance Modelling of a Distributed Database System, submitted for publication, October 1996. [18] Oliver Ibe, Hoon Choi and Kishor S. Trivedi, Performance Evaluation of Client-Server Systems, IEEE Transactions on Parallel and Distributed Systems, Vol. 4, No. 11, November 1993. [19] P. Heidelberder and K. S. Trivedi, Analytic Queueing Models for Programs with Internal
24
Concurrency, IEEE Transactions on Computers, pp. 73-82, January 1983. [20] Bao-Chyuan Jenq, Walter Kohler, and Don Towsley, A Queueing Network Model for a Distributed Database Testbed System, IEEE Transactions on Software Engineering, Vol. 14, No. 7, July 1988. [21] Y. Bard, Some Extensions to Multiclass Queueing Network Analysis, In M. Arato, A. Butrimenko, and E. Gelenbe (eds), Performance of Computer Systems, North-Holland, 1979. [22] K. M. Chandy and D. Neuse, A Heuristic Algorithm for Queueing Network Models of Computing Systems, pp. 126-133, February 1982. [23] Connie U. Smith, Performance Engineering of Software Systems, Addison-Wesley, 1990.
25