The Performance Impact of Workload Characterization for ... - CiteSeerX

The Performance Impact of Workload Characterization for Distributed Applications using ARM

F. ElRayes, J. Rolia, and R. Friedrich* Department of Systems and Computer Engineering Carleton University, Ottawa, Canada, K1S 5B6 {ferass,jar}@sce.carleton.ca *Hewlett-Packard Laboratories, Palo Alto, CA USA [email protected]

Abstract This paper describes operating system process, software, and business function oriented workload characterization abstractions for distributed applications. Many can be supported directly by ARM 2.0. Suggestions for extensions to ARM 2.0 are made that support the remaining abstractions. The abstractions present different views of application performance behavior. An ARM prototype is described along with overhead measurements for several different workload abstractions. The measurements are used in a model of a hypothetical electronic commerce system to provide insights regarding the performance impact of the monitoring abstractions for more realistic workloads.

1. Introduction Distributed application systems consist of many software processes distributed over a network cooperating to accomplish some overall application goals. Performance management [5] can be difficult for these systems because of the sheer number and diversity of subsystems and vendors that can contribute to such applications. Pervasive instrumentation and monitoring infrastructures are needed to characterize the behavior of these systems. Standards are needed to ensure that data from the different monitors can be coalesced and used to solve problems. The Application Response and Measurement (ARM 2.0) [2] package supports the goal of distributed application management by providing a defactostandard, relatively simple, API for application instrumentation. ARM supports a performance management process where intervals of software

code are defined as performance transactions. The transactions give a model for understanding the behavior of applications. A pair of API calls wrap transactions to generate performance events that give count and response time measures. ARM infrastructures accept these events and may pass them to performance databases. The data is then used to accomplish performance management tasks including: Quality of Service (QoS) monitoring and management, performance debugging, and the construction of application oriented capacity planning models. To manage overhead, the events generated within an application process may be aggregated over a reporting period before being reported. A previous study [1] considered the performance costs of data reduction and communication associated with this kind of distributed application monitoring. The authors demonstrated that managing the duration of such reporting periods could effectively manage monitoring overhead. They also concluded that the CPU time requirements for reporting the data was responsible for much of the total monitoring overhead. For efficient monitoring designs, overhead can be expected to cause CPU and network utilization of only a few percent for even the most demanding monitoring scenarios. A key point is to collect and, more importantly, to report only the data that helps accomplish the management task at hand. To further manage overhead, an ARM implementation may support the reporting of performance information at several levels of detail and abstraction. A level of detail controls whether means, higher moments, and/or percentiles are captured and reported for events [4]. A worklaod abstraction decides the coarseness of reported information. It may range from providing: no information, to information about specific subsystems or processes, to a full trace of all events caused by each user request as it traverses the system. Each abstraction causes different overhead

and is best suited to support some subset of management tasks. Section 2 of this paper describes the ARM Application Programming Interface (API) and a prototype implementation of an ARM infrastructure. Workload abstractions based on data from ARM, and our extensions to ARM, are given in Section 3. Section 4 presents the results of a measurement study that shows the performance impact of our ARM prototype and workload abstraction levels on a sample Common Object Request Broker Architecture (CORBA) [7] application. The performance impact on a model for an electronic commerce system is also considered. A summary and conclusions are offered in Section 5.

2. A Prototype for ARM version 2.0 This section gives a brief overview of a reference ARM infrastructure, the ARM API, and the relationship of ARM data with operating system monitors. The ARM Software Development Kit (SDK) and examples of instrumented code are available via the Web [2]. Several limitations of ARM 2.0 are described. Finally, we describe a prototype implementation of an ARM system. The ARM 2.0 API is oriented around application names, transaction names, and handles that identify individual executions of transactions. To reduce monitoring overhead, names and handles are associated with numeric identifiers that appear in reported events. We demonstrate in later sections

how managing the relationship between names and identifiers can be used to manage workload abstractions and minimize monitoring overhead for related management tasks. 2.1 Reference ARM Architecture A reference ARM infrastructure consists of an ARM library that is linked with application processes, an ARM Agent, and an ARM Manager server. The architecture is illustrated in Figure 1. The ARM library is responsible for handling ARM API calls, managing local data structures, and reporting data to the Agent. The ARM Agent manages the relationship between ARM names and ids, and accepts data from application processes and forwards them to a performance database. The Manager is a server that provides each of its Agents with a unique identifier called an agent_instance. For most systems there is likely to be one ARM Agent per node and one Manager per ARM management domain. There could be many ARM management domains that together fully support a networked infrastructure. An Agent is aware of its own identification information. This includes its well known vendor_id, its version, and an address specification (for example a URL) for an interface that can be used by others to correlate its ids with their corresponding names.

CPU Demand / visit = 10 ms Vists to CPU per tran . = 3 Calls to arm_start / tran . = 1

ARM Library Observer Method

ARM Library

Z = 1 sec

Calls to arm_stop /

tran . = 1

Client Process 1 visit /

ARM Library

Server Object

Z = 300 sec

Timer

transaction .

Method1 Method2 . Method50

ARM Library Observer Metho d

Report Performance Data ARM Agent

ARM Manager

Invoke a Server Method Request

appl _id’s, tran _ id’s or agent_instance

Figure 1 ARM Architecture, Measurement Setup, and Corresponding Layered Queuing Model

2.2 The ARM 2.0 API A program instrumented with ARM goes through three stages: an initialization stage where application names and transaction names are associated with identifier values used in the monitoring, a monitoring stage where data may be reported, and a shutdown stage where an ARM library returns memory resources to the system. The ARM 2.0 API includes the following calls [2]. We provide only a summary of the API here and refer the reader to [2] for a more detailed description of the API. The calls may be added to an application’s source code or be included within a runtime environment or application framework. •

appl_id = arm_init(application_name,...) // initialization stage

•

tran_id = arm_getid(transaction_name,appl_id,...) // initialization stage

•

start_handle = arm_start(tran_id,...) // monitoring stage

•

arm_update(data_to_include_in_a_trace,....) // monitoring stage

•

arm_stop(....) // monitoring stage

•

arm_end(....) // shutdown stage

These calls are supported by the vendor specific ARM libraries that are linked into application programs. ARM 2.0 introduces an opaque correlator data structure that links instances of transactions together. The correlator is a parameter that may be added on to each Remote Procedure Call (RPC) or messaging call that causes a request to flow from one transaction to another. It includes the caller’s tran_id and start_handle along with its Agent’s identification information. Via this correlator, ARM 2.0 enables the collection of information needed to support end-to-end performance debugging tasks, to deduce aspects of application structure [9], and to collect parameters needed for application oriented capacity planning models. The correlator has a correlator_type field to enable different definitions for the data structure. ARM 2.0 defines one type of correlator. We have identified several limitations for ARM 2.0. It does not provide a standard way to describe the nature of an RPC. Model builders must know whether an RPC is synchronous, asynchronous, forwarding, or deferred-synchronous. There is no standard way to document fork-join relationships and end-to-end behavior can be characterized only in trace mode. Furthermore, ARM does not provide information about the resource consumption of the transactions. Consequently, an ARM Agent must interact with its host operating system monitor to collect resource

consumption measures necessary for performance management and correlate these with the transactions [11].

2.3 Prototype Design and Implementation Issues Our prototype implementation for ARM 2.0 follows the reference architecture we described in Section 2.1. The implementation is focussed around the notion of performance objects. A performance object is the data structure that is used to aggregate count and response time information about ARM transaction event data and that gets reported by an ARM library to its Agent. The size of a performance object is determined by the level of monitoring detail and the information that identifies its workload abstraction. Our ARM library data structure is essentially a multiply linked list that forms a tree-like data structure. Each item in a list is either a performance data object for an identifier such as a tran_id, or a reference to another list. Each layer in this tree structure includes an identifier that offers a further dimension for workload characterization. For example, using two lists we can characterize by both server tran_id and caller tran_id with the caller’s identification information (acquired via a correlator). The path from the first list to the performance object contains the ids that give meaning to the performance object and the address of the client’s Agent interface if needed to interpret its ids. An arm_start call allocates a temporary data structure used to maintain performance information about a transaction until its arm_stop call. This includes data from arm_update calls. When arm_stop is invoked, the ARM library searches for a corresponding performance object in its data structure (one with matching ids) and updates it (creating it if it does not exist). The update is protected by mutual exclusion. The ARM library contains an independent observer thread that sleeps for the reporting period and then prepares the data before forwarding it to the ARM Agent. Only performance objects with data values that have changed since the last interval are reported [4]. Performance objects are reported with appropriate ids to document their meaning. Performance data access and reset operations are protected by mutual exclusion to ensure the correctness of the results. A CORBA RPC is used to transmit performance data to the ARM Agent. When ARM is enabled there are two kinds of performance overhead: the overhead of the inline ARM API calls and the associated update of the performance objects, and the reporting costs for forwarding the performance objects to the Agent and then the performance database. As long as ARM is

enabled, the inline overhead will be present. Changing workload abstractions affects: 1) the search time for finding and updating a performance object in the tree, and 2) the number of performance objects that will be reported.

the perspective of application level objects and services. To facilitate this abstraction we chose to use the following naming convention for ARM transaction names. Each transaction name is of the form:

By managing workload abstraction we influence the larger reporting overhead component of the monitoring. Performance objects can be aggregated, reported, and stored at a level of abstraction that matches their intended use, thereby mimimizing monitoring overhead.

The business function workload abstraction defines end-to-end requests. Naming conventions are also appropriate here, for example:

In our prototype we have also introduced two new types of correlator data structures. These support the notion of business function workload classes and provide for further useful workload abstractions. Like an ARM transaction, a business function workload class is identified using a bf_wl_name, and a bf_wl_id, however in our implementation we do not require a corresponding start handle. We have introduced API calls to manage these new data. The relationship between a name and its id should be managed by the ARM infrastructure in the same manner as transaction_name and tran_id pairs are managed by an ARM Agent. Ideally the management of the business function workload class name and its id should be done in cooperation with ARM Managers that together support the networked environment. In this way clients in different ARM domains can initiate business function requests that fall into the same classes of work. Full support for this feature requires a business function name server with a well-defined interaction protocol.

3. Supporting Abstractions

Workload

Characterization

We identify three workload abstractions that should receive support from the ARM standard. A process abstraction that characterizes interactions between client/server process pairs, a software abstraction that lets us distinguish between application software constructs such as objects and methods, and a business function abstraction that characterizes the end-to-end behavior of requests. The process workload abstraction captures count and response time information for interactions between instrumented processes. This is essentially the level of detail needed by today’s commercial analytic performance evaluation packages for models for distributed applications. A software workload abstraction lets us look further into application processes to capture measures from

Object_name::Method_name::Activity_name.

Business_function_name::End_to_end_request_name.

The software and business function workload abstractions bring the monitoring to a level that is more appropriate for distributed application management. Workload abstractions can all be managed by ARM infrastructures by controlling the relationships between names and ids. We have identified many useful workload abstractions and explain several of them in this section. The process level workload abstraction can be accomplished by associating all transaction names in a process with the same tran_id. These form the first level in the ARM library’s tree data structure. The ARM 2.0 correlator contains the tran_id and agent information needed to identify calling processes to keep track of inter-process visits and response time measures. It forms a second level in our tree data structure. The software workload abstraction can be managed by grouping ARM transaction names by activity name, method name, or object name and associating the group with its own tran_id. As with the process abstraction, the correlator can be used to characterize the use of the group by its caller’s groups. The correlator information forms a second level in our tree data structure. The business function workload abstraction requires new definitions for the correlator structure and a change to the correlator propagation rule. To characterize all ARM transactions by business function name or end-to-end request name our implementation requires the initial client’s correlator tran_id to be replaced with a bf_wl_id. The initial client’s correlator must be propagated without modification across all transaction boundaries. The first level and only level of the ARM library tree data structure is the bf_wl_ids. To characterize a business function by user or by some software abstraction requires a correlator to include both the bf_wl_id in addition to the information already in the ARM 2.0 correlator. The propagation rule is similar to the ARM 2.0 rule except that the initial client’s bf_wl_id is always propagated. With this rule it is possible to start with a specific tran_id from

an initial client and use a breadth first search across all performance databases to collect the performance objects that were caused by the tran_id aggregated by bf_wl_id. The correlator in our prototype was redefined to accommodate the new 4 byte bf_wl_id without increasing its maximum size by decreasing the maximum address size. Next we consider a sample application along with several workload abstractions and give the potential number of performance objects used to characterize the application. The sample application consists of 10 clients calling a server process that contains one object with 50 methods. The clients are defined as having between 1 and 20 business functions. Table 1 identifies the workload abstractions, the number of business functions, the maximum number of performance objects to be reported by each instrumented client and server, and the maximum total number of performance objects for the application as a whole. The actual number of reported objects may be less since Agents only report performance objects that have been updated during a reporting period. By method characterizes all processes by Object_name::Method_name. By caller characterizes each Object_name::Method_name by its calling Object_name::Method_name separately. In our example By business function considers twenty different business functions for each Object_name::Method_name. End-to-end by User characterizes each individual client’s end-to-end business functions separately. It may be necessary to change the abstraction level for monitoring data as an application executes. More or less detailed data could be required. This can be accomplished by reassigning tran_id values but must be done without interfering with the application processes. It would be highly undesirable to force Table 1 Examples of Workload Abstractions Case Abstraction Level Num. Max # Business Reported Function Performance Objects / Client 1 No 1 0 Instrumentation 2 By Method 1 1 3 By Caller 1 1 4 By Business 20 20 Function 5 End To End User 10 10

another initialization phase for application processes to acquire new tran_ids from the ARM Agent. The ARM library and Agent must provide a framework that gives a solution for this problem. A solution is to use a level of indirection that maps the tran_id used in each transaction’s instrumentation code to a variable that contains its current value. The true value is managed by the ARM library and used when generating correlators and when reporting performance objects. If a library is required to change abstraction levels it can update the true values for all transactions at the end of its current reporting period. An ARM Agent could determine the levels of abstractions a process must support and provide corresponding tran_id values to the ARM library when the ARM arm_getid() calls are made. A protocol for modifying these mappings would be useful. 4.0 Experimental Design and Results In this section we consider the performance impact of monitoring a distributed application at several workload characterization abstractions. The experiments measure increases in CPU utilization and mean client response times as the amount of information reported is increased. The ARM instrumentation and other measurements are used to construct and validate a corresponding Layered Queuing Model (LQM) [6]. LQMs are extensions to Queuing Network Models (QNM) that capture many detailed software relationships. The model is described further in Section 4.1.1. The results of the measurements are discussed and compared with the results of the analytic model. The analytic model is then extended to consider the performance impact of the workload abstractions on a model of an electronic commerce system with more realistic workload conditions.

Max Reported Performance Objects / Server

Total Max Reported Performance Objects

0

0

50 500 1000

60 510 1200

5000

5100

4. 1 Measurement Based Case Study To study the performance impact of monitoring at different abstraction levels we implemented a simple client/server CORBA application for the example described by Table 1. Ten client processes with statistically identical behavior share the server. Clients use some CPU resource, cause an average of one visit to a random method of the server, and think. The clients and server run on the same node. This gives us a worst case scenario in that ARM monitoring overhead will have its greatest impact on the reported results. The application ran on a single HP-UX 9000/712 with ORB-Plus 2.01 [8] as the CORBA environment. All measurement trials with enabled instrumentation make the same number of ARM API calls but differ in the number of performance objects reported to the Agent. The reporting interval was 5 minutes for each client and the server. Table 1 describes the workload abstractions we consider. The actual number of performance data objects reported depends on client throughput during the period and the number of distinct performance objects that are touched by the requests. The experimental setup is shown in Figure 1. Resource demands and visit ratios are illustrated in the figure. The total CPU demands for the clients and server per client request were an average of 30 and 32 ms, respectively. These are relatively low CPU demands for a client server system and make our results pessimistic. The client think times were chosen to cause an approximately 60% CPU utilization for the case without instrumentation. An operating system based performance monitor product named HP-UX Glance Plus [3] was enabled for all experiments.

4.1.1 Building the LQM

building exercises. Our Agent accepts data from the ARM libraries, summarizes it, and outputs it to a LQM file for use with the Method Of Layers (MOL) analytic solver [6]. The file contains the model’s basic structure. LQMs are extended queuing network models that consider the methods of objects within processes and software blocking. To complete the model, per process CPU demand values are acquired from Glance Plus and mapped onto the components of the LQM. Figure 1 illustrates the LQM. To model the ARM infrastructure itself it was necessary to introduce Observer methods to the client and server and a Timer by hand. The observer methods represent the portion of the library that reports performance objects to the Agent. The timer causes the server’s observer method to be executed. For clients, the observer method has a small visit probability so that as a client chooses between its two methods it visits the observer method an average of once every 5 minutes. The Agent’s CPU demand reported by Glance Plus is almost completely due to the unmarshalling of the data received from the client and server ARM libraries. Marshalling and unmarshalling are procedures taken to convert data structures to and from the messages needed to accomplish interprocess communication. We used the mean Agent CPU demand per visit to estimate observer method CPU demands for both the clients and server. Their CPU demands are primarily due to marshalling overheads. The CPU demands were weighted by the number of bytes reported so that the server’s observer received a proportionally higher CPU demand than client observers. In each case observer CPU demand was subtracted from the total CPU demand reported for the process to get the CPU demand for application methods. Lastly, the difference between the CPU demand of a process without instrumentation and the computed CPU demand of the application methods with instrumentation is used as a measure of the ARM library overhead. Table 2 gives the resulting values for case 4.

One of the advantages of ARM style instrumentation is that its results can be used to support model Table 2 Reporting and API Overheads for Case 4 Process

CPU Demand (ms)

Throughput (trans / 5 minutes)

Client Server

8386 89763

266.79 2667.9

Average Reported Performance Objects 20 410.64

Reporting Overhead (ms / 5 minutes) 3.43 281

API Overhead (ms / trans)

1.44 1.68

CPU Utilization vs. # of Performance Objects

Mean Client Response Time vs. # of Performance Objects

61 60 59

Low 95% High CI 95% CI

U (%) 58 57 56

Lo w 95% Hig CI h 95% CI MOL

120 R 110 (ms ec) 100 90

55 0

59.7

506.4 610.6 2084.2

# Performance Objects

0

59.7 506.4 610. 2084 6

# Performance Objects

Figure 2: Measured CPU Utilization, and Mean Client Response Time with MOL Estimates Vs. Average Total Number of Performance Data Objects protocol. We are unable to explain this measurement result further at this time. 4.1.2 Interpreting the Results of Measurement and Analysis Figure 2 illustrates the performance impact of monitoring for the test cases. On the x-axis we list the average total number of performance objects being reported for each reporting period. Measurement points required up to 10 hours each to achieve statistical confidence. Between 100% and 40% of the total maximum possible performance objects listed in Table 1 were reported per reporting period. The results of Figure 4 show that CPU utilization is only slightly affected (±1%) by the ARM instrumentation. The differences are small and on the order of our measurement error. Mean client response times are affected by up to 8.7%. The main reason for the increase in client response times is the longer path length for each request. The ARM API calls in our CORBA implementation of the Agent add between 2.2 and 3.3 ms to the path length of an end-to-end client request. After queuing for the CPU, this can be expected to add 5 or 6 ms to the client response time. The MOL analytic solver takes as input the LQM file (as explained in Section 4.1.1) and predicts mean client response times. The estimates are within the 95% confidence intervals. This validates our model for the ARM instrumentation. We reuse the model in the next section. An anomaly in the measures is the drop in measured client response times for the case with 506 performance objects with respect to the 60 object case. The CPU demands are lower, this may be due to the random nature of our experimental runs or the more efficient use of an underlying communication

4.2 Scalability Case Study for a Model of an Electronic Commerce System In this section we consider the ARM library and reporting overheads computed in Section 4.1 in a more realistic scenario. Figure 3 illustrates a model for an example Electronic Commerce system that consists of client processes, a front end Web server that supports an HTTP daemon and a pool of server processes, a back end database that processes purchases, and a remote database server that handles inventory control. The figure illustrates the processes, transaction probabilities, and think times between transactions. The web server processes run on a Web node, the DB process runs on DB node, clients and the remote DB run on their own nodes. The Web node and DB node have processing speeds four times that of the system measured in Section 4.1. Processing demands for client transactions range between 80 and 200 ms at the client machine, corresponding transactions in the front end server pool are between 10 and 80 ms and each have 8ms of disk demand, and the HTTP daemon requires 0.5 ms. The back end database server requires 50 ms for a purchase transaction and causes 40 ms of disk demand. The database also executes 200 ms of work every 30 seconds to communicate inventory requirements with a remote database server. We augment the above resource demands to include appropriate ARM monitoring overheads (from Section 4.1) for the workload abstractions of Table 1. We assume the maximum number of performance objects are reported for each case; for case 5 there was a

total of 3,302 performance objects reported each reporting period. Figure 4 illustrates the results for the cases with between 75 and 150 concurrent clients, with a single ARM Agent on either the Database or Web node, and with a reporting period of either 300,000 or 60,000 ms (5 or 1 min -- we do not change the number of reported performance objects). The mean client response times increase approximately 3 ms when the ARM instrumentation is enabled, however there are insignificant differences in client response times for the different workload abstractions. The results of this case study show that process, software, and business process workload abstractions can be managed by ARM without a significant performance impact. Significant flexibility and detail are achieved without resorting to tracing which would have a much higher overhead and excessively disrupt the distributed application environment. 5.0 Summary and Conclusions ARM provides a useful framework for developing instrumentation for distributed application systems. We described a prototype implementation of ARM and presented the results of measurement experiments that demonstrate the performance impact of the prototype’s ARM instrumentation on CPU utilization and mean client response times. We introduce several workload abstractions based on process, software, and business function abstractions that provide detail well suited toward application oriented performance management tasks, including QoS monitoring and management, and constructing application oriented capacity planning models for distributed application systems. An analytic LQM was validated with respect to the measured system. The model was extended to describe an electronic commerce system and used to predict the performance impact of the workload characterization abstractions in this more realistic environment. The results show that increasing the detail of the workload characterization does not necessarily have a significant impact on CPU utilization or client response times. This is because the actual number of reported performance objects per reporting interval is often much less than the maximum possible number of reported performance objects and the overheads can be small compared to total processing and communication latency. Finally, by adding a level of indirection in the ARM library’s support for tran_id and adding business function workload classes to ARM we introduced a great deal of workload characterization power and

enable on-the-fly changes to control the quantity and utility of the reported data. These changes can help increase the useful life of the underlying ARM instrumentation.

Acknowledgments This work was generously supported by grants from Hewlett-Packard Labs in Palo Alto, and by the Natural Sciences and Engineering Research Council of Canada.

Trademarks HP-UX and Glance Plus are trademarks of the Hewlett-Packard corporation.

Bibliography [1] R. Friedrich and J. Rolia, "Applying Performance Engineering to a Distributed Application Monitoring System," Editors A. Schill, C. Mittasch, O. Spaniol, and C. Popien, Distributed Platforms,Chapman and Hall Publishers, 1996, pages 258-271. [2] Application Response Measurement API Guide. Second Edition. http:// www.cmg.org/regions/cmgarmw. [3] Getting Started with Glance Plus, Hewlett Packard, Part No. B3691-90014, 1995. [4] R. Friedrich, J. Martinka, T. Sienknecht, S. Saunders, "Integration of Performance Measurement and Modeling for Open Distributed Processing," Editors K. Raymond and L. Armstrong, Open Distributed Processing, Chapman and Hall Publishers, 1995, pages 341-352. [5] M.A. Bauer, P.J. Finnigan, J.W. Hong, J.A. Rolia, T.J. Teorey, G.A. Winters, ″Reference architecture for distributed systems management,'' IBM Systems Journal, Volume 33, Number 3, 1994, pages 426-444. [6] J. Rolia and K. Sevcik. "The Method of Layers, " IEEE Transactions on Software Engineering, Vol. 21, No. 8, pp. 689 – 700, August 1995. [7] CORBA, Fundementals and Programming, J. Siegel, Wiley Publishers, 1996. [8] ORB Plus User's Guide, Version 2.01, HewlettPackard, 1996.

Browse = 0.435, Z= 10s AddCart = 0.1, Z = 1s RemCart = 0.05, Z = 1s Summarize = 0.2, Z= 5s Purchase = 0.1, Z=10s Login = 0.05, Z=1s Logout = 0.05, Z=1s Observer = 0.015, Z = 0s

Client Timer_Observer

Server_pool

Timer_ DB

HTTP_D

Observer

Observer

Purchase

DB _repoprt

DB

RPC Results To ARM Performance DB

Observer

ARM Agent

Remote_ DB

Figure 3 A Model for an Electronic Commerce System. Clients access a front end Web server that uses transaction semantics to a back end database server to record purchase details. Periodically the database server sends inventory requirements to a remote database server.

Mean client response time Reporting ARM_Agent #Client Period Node Nodes ms 300,000 DB_node 75 100 150 Web_node 75 100 150 60,000 DB_node 75 100 150 Web_node 75 100 150

Workload Abstraction None ByMethod

239.62 245.16 259.51 240.24 245.72 260.82 239.63 245.17 259.53 240.28 245.78 260.91

242.85 248.50 264.10 243.22 249.11 265.52 242.87 248.53 264.13 243.30 249.21 265.68

ByRequest

242.87 248.53 264.12 243.25 249.14 265.56 242.89 248.55 264.16 243.32 249.24 265.72

ByCaller

242.85 248.51 264.11 243.24 249.14 265.57 242.90 248.56 264.18 243.42 249.37 265.94

EndTo End 242.88 248.53 264.13 243.27 249.17 265.61 242.92 248.59 264.21 243.45 249.40 265.98

Figure 4 Impact of Workload Characterization Abstractions on Client Mean Response Times for a Model of an Electronic Commerce Server

[9] C.E. Hrischuk, J. Rolia, C.M. Woodside, "Automatic Generation of a Software Performance Model Using an Object-Oriented Prototype," Proceedings of International Workshop on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS’95), Durham, North Carolina, January 1995, pages 399 409. [10] Y. Ding, “Performance Modeling with ARM: Pros and Cons,” Proceedings of the CMG ’97, pages 34-45. [11] J. Rolia and V. Vetland, "Correlating Resource Demand Information with ARM Data for Application Services," to appear in the Proceedings of the First International Workshop on Software and Performance, WOSP '98, Santa Fe, NM, October 12-16, 1998.