makes the performance of these applications an important issue ... performance models from trace files generated by a non-intrusive monitoring ... design tool.
Framework for predicting the performance of component-based systems Daniela Mania & John Murphy Performance Engineering Laboratory, Dublin City University, Ireland e-mail: {maniad, murphyj}@eeng.dcu.ie
Abstract: The performance of component-based systems, especially of e-commerce applications, becomes a key factor in keeping business relations active. Middleware performance offered by technologies such as EJB, .NET, CORBA does not guarantee the fulfilment of performance requirements. We propose a framework that automatically builds an analytical model and drives possible performance improvements of the system under study. We outline a methodology of discovering transactions within the system at run-time.
1. INTRODUCTION The growing use of e-commerce applications in business makes the performance of these applications an important issue to be considered. One of the reasons is that performance metrics (i.e. response time, throughput) play an important role in attracting and retaining customers. Middleware such as Sun’s Enterprise Java Beans (EJB), Microsoft’s .NET, OMG’s CORBA Component Model (CCM) makes such applications easier to develop. These technologies help the developers to meet the time-to-market requirements but do not guarantee the performance of the application. Because of the increasing complexity of the systems, it is often hard for developers to clearly understand the behaviour of the application. When the system is deployed and running and the performance goals are not met, it is difficult if not impossible to determine where problems are. We propose a framework that helps developers to understand the behaviour of the system, and automatically constructs an analytical model of it. The model is used to predict the performance and to drive the possible improvements of the system. We are developing a technique that builds performance models from trace files generated by a non-intrusive monitoring of an EJB-based e-commerce application. The rest of the paper is organized as follows. The related work and an overview of EJB technology are presented in Section 2. Section 3 is a description of the overall framework architecture proposed. Section 4 outlines the methodologies we intend to use within the framework. Conclusion and future work are presented in Section 5.
2. BACKGROUD 2.1 Related work There has been a significant amount of research in developing software performance models. Trivedi [1] uses stochastic Petri-net to predict software performance on the source code level. Woodside [2] uses Layered Queueing Models to predict performance of a software design. Hrischuk [3], El-Sayed [5] build performance models from traces. Trace-based Load Characterization (TLC) is a technique proposed by Curtis Hrischuk [4]. This technique uses a special kind of trace called “angio-trace” [3], automatically generated from instrumentation. The “angio-trace” is based on the concept of using a “dye” to capture the cause and effect relationships for each user-defined and communication event. TLC is based on matching templates to event graphs to recognize event patterns. The shortcome of this technique is due to the use of templates, since they are limited to the imagination of the author. The Model Builder technique proposed by El-Sayed [5] develops performance models through traces out of the SDL design tool. Some assumptions are made to simplify the problem. For example, it is assumed that the queues are FIFO with no priorities, and thus limits the ability of the technique to create models beyond this criterion. However, none of techniques presented above are related to EJB-based systems and none of them are using a trace file from a real time non-intrusive monitoring system. 2.2 EJB overview Enterprise Java Beans (EJBs) technology has gained a significant acceptance among enterprise development teams and platforms providers. This is because EJB is a server-side component architecture that simplifies the process of building scalable, reliable and secure applications without the need to write complex distributed component framework [6]. The Enterprise Java Beans server/container provides automatic support for middleware services such as security, transactions, persistence, remote accessibility, resource management and
database connectivity, thus reducing the complexity of the application development. An Enterprise Java Beans (EJB) component consists of: the bean class, the home interface, the local interfaces (if applicable), the remote interface, and the deployment descriptor. The bean class contains the implementation details of the EJB component, and is a Java class that has a welldefined interface, which obeys certain rules. The remote interface defines the exposed business methods of the EJB. The home interface defines methods for creating, destroying and locating the beans. The local interfaces, defined in the EJB 2.0 specification [8], are high performance versions of the home/remote interface that allow the avoidance of the overhead introduced by Remote Method Invocation – Internet Inter-ORB Protocol (RMI-IIOP) for beans situated in the same Java Virtual Machine. The deployment descriptor is an eXtensible Markup Language (XML) file used to specify the following middleware requirements of the bean: bean management and lifecycle, persistence, transaction and security requirements. EJB-based e-commerce applications are composed of interdependent EJBs, some developed in-house, some commercially available and others developed by the customer. The EJBs and their interconnections typically vary for each design. Performance prediction is important for such applications to help the designers select a better design and to adjust the software architecture for better performance. 3. FRAMEWORK OVERVIEW We propose a framework that helps software developers to better understand the dynamic behavior and to help them to improve the performance of their system. Figure 1 outlines the main components of our framework. The main modules are: Non-intrusive Monitor, Method Invocation Path Tree (MIPT) Detector, Workload Model, Performance Model, Deployment diagram and System Optimization. Non-intrusive Monitor module collects information about the EJB-based system under study. The term “non-intrusive” refers to the fact the monitoring does not require changes to application source code or server implementation. The information collected refers to component instance IDs, method names and method execution times [8]. MIPT Detector module extracts the main transactions of the system. MIPT is a node-labelled, directed, acyclic graph whose structure is based on the causal relation between events in the trace file provided by the Non-intrusive Monitor module. The arcs in the MIPT represent the “cause and effect” relationship. Workload Model consists on two components: resource usage and workload intensity. The first component describes the resource consumption caused by the workload. Typically, these
resources include CPU usage, I/O activities and database. Workload intensity refers to parameters such as frequency distribution of the request (i.e. the population of each request on the total workload), request inter-arrival distribution, etc.
Figure 1 Framework overview Deployment diagram describe the configuration of the set of run-time processing nodes and the components that reside on them. This view focuses on distribution, delivery, and installation. Performance Model represents the analytical model of the system under study. It is constructed using the information provided by the MIPT detector module, Workload model and Deployment diagram. The model provides performance predictions (i.e. mean response time, throughput, utilization) when the workload characteristics or system capacity is changed. System Optimization module drives possible improvements of the system (e.g. anti-pattern usage, inefficient algorithms, excessive object creation). 4. METHODOLOGY 4.1 MIPT detector A trace file is a record of a sequence of events. Tracing a program is used to understand how the program executes and reveals the dynamic details of the design. A timestamp is attached to every event recorded in the trace to maintain its chronological order. We assume that we have the correct chronological order of events that was achieved through the clock’s synchronization. Since we are using a non-intrusive monitoring approach [7], events recorded have the following fields: 1. Identification: composed of component instanceID (the EJB component name and the instance number) and method ID (the name of the
2. 3. 4.
method called from the specified EJB component instance); Current time (timestamp): time when the method was called; Duration: the cumulative total time spent in that method and any method that is called; Actual parameters (optional): the value of the method’s parameters called.
Figure 2 presents an example of such a trace file.
EJB j ∈ {ejb − ref ( EJBi ) ∪ 1, A(i, j) = ∪ ejb − local − ref ( EJBi )}, 0, otherwise
i ≠ j,0 ≤ i, j < n
For each pair ( EJB x , EJBz ) with A( x, z ) = 1 we define the “calling” probabilities matrix: Px ,z ∈ M m×q (m, q are numbers of methods defined by EJB x , EJBz respectively). The element Px ,z (i, j) will define the probability that method i ( 0 ≤ i ≤ m − 1 ) from EJB x calls method j ( 0 ≤ j ≤ q − 1 ) from EJB z . 0, EJB x .Meth i → / EJB z .Meth j Px ,z (i, j) = p , EJB . Meth EJB z .Meth j → x i
(2)
The first EJBs along the invocation path will be the ones that satisfy the relation: A(i, j) = 0 ( ∀0 ≤ i < n ) (3) In order to determine the possible transactions within the system we propose the following methodology:
Figure 2 Example of the trace file The fact that we have only one type of event (i.e. receive event) is the main difficulty in constructing the MIPT. This is due to the way of the non-intrusive monitoring in done [8]. Since we cannot capture the context of the method invocation, we don’t know the caller of the EJB’s methods. However, we can obtain all the EJBs that can potentially call a method of another EJB by using the EJB references declared in the EJB deployment descriptor. From the EJB references, we determine which EJBs are called by the EJB that declared them in its deployment descriptor. Session and entity beans use synchronous communication with remote method invocation (RMI). This means that if an EJB calls another EJB the first one is blocked until it receives a reply from the second one. By reading the deployment descriptors of all EJBs within the system, we can determine: S EJB , the set of all EJB’s; S meth , the set of all methods (method ID will contain the EJB name of which that method belongs to); A ∈ M n × n , the EJB matrix where: n = S EJB , total number of all EJBs in the system A ( i, i ) = 0
,0 ≤ i < n
(1)
A) transaction file - building step: 1. Select the first method from the trace file. Set a variable ct = cTime of the method selected. 2. Extract all the rows until cTime > ct + Duration of the method selected in the previous step. This represents one possible transaction that begins with method selected in step 1. 3. Parse the trace file in order to detect another occurrences of the method selected in step 1 and set ct = cTime of the new occurrence found. 4. Repeat 2 and 3 until the end of the trace file. In this manner, we build another file, called transaction file, which contains all possible transactions that begins with the method detected in step 1. We repeat the procedure presented above by selecting as the first method the ones defined in the EJBs that satisfy (3). B) transaction file - refining step: 1. Select the first “transaction” from the transaction file. 2. Parse the “transaction” selected from bottom to top and eliminate the methods that satisfy the following conditions: a. The methods within the same EJB as the first method of the transaction (cycles are not allowed; one method is not allowed to call another method from within the same EJB); b. The methods x that have the sum of the current time and duration greater than or
3. 4. 5.
equal to the sum of the current time and duration of the first method of the transaction: (cTime + Duration) x ≥ (cTime + Duration)1 ; Construct in a backtracking manner all possible “real transactions”, taking into account cTime and Duration of each method. Eliminate the “real transactions” obtained before that do not conform to the set of constrains (C) specific to the technology used. Select the next “transaction” from the “transaction file” and repeat steps 2, 3 and 4 until there are no more “transactions” to be processed.
Repeat the above procedure for each “transaction file” obtained in the step A. The set C contains, for the EJB technology, at least the following constraints: i. a method from EJB i is the caller of another method from EJB j if and only if A(i, j) = 1 , ∀0 ≤ i, j < n . ii.
if a method from EJB i has called a method from one particular instance of another EJB j , than all the subsequent methods called from the same EJB j will be part of the same instance.
Step 4 from the above procedure can be combined with frequency patterns algorithms used in data mining. We define the probability matrix Pi ∈ M k × k (k = number of methods involved in the transaction Ti ) for each transaction detected in the previous steps. Pi ( m, n ) represents the probability that method m is calling method n within transaction Ti . MIPT is built using the information obtained from the methodology presented. In Figure 3, we present an example of MIPT. The graph nodes are the methods that could be part of the transaction. They are annotated with metrics such as the mean response time and the method call count. We consider that the time proceeds from top to bottom and from left to right. In order to determine a transaction we have to traverse the graph in a prefix manner. The arcs represent “cause and effect” relation between methods. They have associated the probability Pi (m, n ) defined above. The nodes in gray are the ones that are not likely to be part of the real transaction. The information gathered in the defined matrices is used to build an analytical model of the system. The initial trace file provides much more information (e.g. methods call counts, mean response times, inter-arrival time distribution, etc.).
Figure 3 Example of MIPT However, these are staring points in developing and elaborating an improved statistical methodology to automatically build performance models for running EJB-based e-commerce systems. 4.2 Performance Models Queueing models have been used successfully in software systems. A shortcome of these models is that they can only model software components that demand one-at-a-time resources. Simultaneous resource demands and parallel subpaths require more sophisticated models, such as Extended Queueing Networks, Performance Petri Nets, Stochastic Activity Networks, and Stochastic Process Algebras. Layered Queueing Networks (LQN) [8,9] are an adaptation of the Extended Queueing Network defined especially to represent that the software components are executed on top of other layers of servers and processors, giving complex combinations of simultaneous resources. LQN defines a system in terms of its objects (software and hardware). The objects are represented as tasks that are divided into three categories: client tasks (only sends requests), active server tasks (can receive and send requests) and pure server tasks (only receives requests). Each task accepts service requests as entries that can correspond to methods exposed by the software objects [10]. We use LQN as a modelling approach. LQN model has been used to study the performance of distributed software systems [4,11]. It is able to identify the performance metrics such as response time, throughput and utilization. Moreover, it can detect software bottlenecks, software queueing effects, etc. We present the EJB method invocation procedure specific to EJB technology in figure 4 and then its LQN model in figure 5. A client (e.g. JSP, servlet, EJB) can invoke a business method of an EJB if it has a reference to an EJBObject of the bean. To obtain this reference it looks up for the bean’s home interface via Java Naming and Directory Interface (JNDI). The home bean factory creates an EJBObject and returns the EJBObject reference back to the client. The client invokes a business method on the EJBObject that delegates it to the actual bean.
and implementing these techniques to automatically build analytical models of component-based systems. ACKNOWLEDGMENT This research is being supported by Enterprise Ireland’s Informatics Research initiative, to whom we are very grateful. REFERENCES
Figure 4 Accessing an EJB method Figure 5 shows the LQN model of the “client calling an EJB” in the case that all the parts involved have their own CPU.
Figure 5 LQN model for accessing an EJB method The model tasks have a value corresponding to their CPU demands. Each arrow has an associated value that is the average number of calls to the entry being pointed to. 5. CONCLUSION Performance of component-based systems, especially EJBbased systems is a relevant issue. Developers are restricted by time-to-market requirements. Therefore, they don’t always follow a standard software process in developing the software systems. Moreover, the performance issues are not mentioned until later when all the design decision has been already taken. We propose a framework that automatically discovers the performance problems and drives possible improvements of the system under study. We outlined a methodology for detecting system transactions from a trace file that is provided by the monitoring module. Future work includes improving MIPT detection techniques
[1] G. Ciardo and K.S. Trivedi “A decomposition approach for stochastic reward net models”, Performance Evaluation, 18(1):37-59, 1993. [2] C.M. Woodside and G. Ragunath, "General Bypass Architecture for High-Performance Distributed Algorithms", Proc. 6th IFIP Conference on Performance of Computer Networks, Istanbul, Oct. 23-26, 1995, in "Data Communications and their Performance", eds. S.Fdida and R.U. Onvural, Chapman and Hall, 1996, pp 51-65. [3] C. Hrischuk, C.M. Woodside, J. Rolia and R. Iversen, “Trace-based load characterization for generating software performance models”, IEEE Transactions on Software Engineering, vol. 25, no. 1, January 1999. [4] C. Hrischuk, J. Rolia and C.M. Woodside, “Automated generation of software performance model using an objectoriented prototype”, International Workshop on Modelling and Simulation, Analysis, Simulation of Computer and Telecommunication Systems (MASCOTS ‘95), pp. 399-409, Durham, NC, 1995. [5] Hesham M. El-Sayed, “A Framework For Automated Performance Engineering of Distributed Real-Time Systems”, PHD Thesis, Dept. of Systems and computer engineering, Carleton University, 1999. [6] Ed Roman, S.W. Amber and T. Jewell, “Mastering Enterprise JavaBeans”, second edition, John W&S, Inc., 2002. [7] Sun Microsystems, “Enterprise JavaBeans Specification, version 2.0”, August 2001. [8] Adrian Mos and John Murphy “Performance Monitoring Of Java Component-Oriented Distributed Applications”, IEEE 9th International Conference on Software, Telecommunications and Computer Networks - SoftCOM 2001, Croatia/Italy, October 9-12, 2001. [9] J. R. Rolia and Kenneth Sevcik, “The method of layers”, IEEE Transactions on Software Engineering, Vol. 21, No. 8, pp. 689-700,1995. [10] M. Woodside, “Layered Performance Modeling and Layered Queueing: Quick Tutorial”, Carleton University, April 2001. [11] Te-Kai Liu, Santhosh Kumaran and Zongwei Luo, “Layered Queueing Models for Enterprise Java Beans Applications”, IBM Research Report, June 2001.