Design-Level Performance Prediction of Component-Based Applications

928

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING,

VOL. 31,

NO. 11,

NOVEMBER 2005

Design-Level Performance Prediction of Component-Based Applications Yan Liu, Member, IEEE, Alan Fekete, Member, IEEE Computer Society, and Ian Gorton, Member, IEEE Abstract—Server-side component technologies such as Enterprise JavaBeans (EJBs), .NET, and CORBA are commonly used in enterprise applications that have requirements for high performance and scalability. When designing such applications, architects must select a suitable component technology platform and application architecture to provide the required performance. This is challenging as no methods or tools exist to predict application performance without building a significant prototype version for subsequent benchmarking. In this paper, we present an approach to predict the performance of component-based server-side applications during the design phase of software development. The approach constructs a quantitative performance model for a proposed application. The model requires inputs from an application-independent performance profile of the underlying component technology platform, and a design description of the application. The results from the model allow the architect to make early decisions between alternative application architectures in terms of their performance and scalability. We demonstrate the method using an EJB application and validate predictions from the model by implementing two different application architectures and measuring their performance on two different implementations of the EJB platform. Index Terms—Quality analysis and evaluation, software architectures, performance measures.

æ 1

INTRODUCTION

C

server-side technologies have proven successful in the construction of enterprise-scale systems. A range of technologies such as Enterprise Java Beans (EJBs), CORBA, and COM+/.NET support the design and deployment of application components in a component container environment. The container provides essential support for aspects such as distributed communications, messaging, persistence, transactions, and security [12]. It also implements concurrency mechanisms so that multiple instances of components can be utilized simultaneously, making it possible for a component-based application to execute thousands of transactions per second (tps). Container behavior for component technologies is configurable in external deployment descriptors. For example, the level of concurrency in the container can be configured by adjusting the size of the thread pool that services component requests. These configuration settings remain transparent to the application component code. This separation of concerns significantly reduces the complexity of coding a distributed application, and is a major reason for the success of these server-side component technologies. A simplified depiction of an EJB component container is shown in Fig. 1. More detail on the EJB architecture can be found in [13]. OMPONENT-BASED

. Y. Liu and I. Gorton are with National ICT Australia (NICTA), NSW, Australia, 1430. E-mail: {jenny.liu, ian.gorton}@ nicta.com.au. . A. Fekete is with the School of Information Technologies, Madsen Building F09, University of Sydney, NSW 2006, Australia, 2006. E-mail: [email protected]. Manuscript received 3 Apr. 2004; revised 16 June 2005; accepted 24 June 2005; published online 1 Dec. 2005. Recommended for acceptance by J. Kramer. For information on obtaining reprints of this article, please send e-mail to: [email protected], and reference IEEECS Log Number TSE-0055-0404. 0098-5589/05/$20.00 ß 2005 IEEE

An implication of this server-side component architecture is that it is not possible to execute the application component code outside of a compatible container environment. The application component’s behavior is a combination of the application-specific code and the underlying container services it utilizes. This creates a challenge when designing an application to meet specified performance levels, as the performance of an application depends on the following [13]: the design and implementation of its applicationspecific components and their interactions, . the implementation of the component container, . the configuration settings for the containers, . the attribute settings of both the application components (e.g., persistent attribute of EJBs) and the container (e.g., the transaction isolation level of the container), and . the simultaneous request load. For standards-based component technologies like EJB, multiple competing implementations of the standard exist. A body of empirical evidence clearly demonstrates the influence of the component container implementation on overall application performance [14]. These results establish that the container implementation and the operating system/hardware platform must be taken into account in order to be able to make useful application performance predictions. Consequently, current best practice to address these problems is to build a prototype and stress test it on the target platform. For a complex application, this is expensive and time-consuming, especially when competing component containers and application designs need to be assessed. .

Published by the IEEE Computer Society

LIU ET AL.: DESIGN-LEVEL PERFORMANCE PREDICTION OF COMPONENT-BASED APPLICATIONS

929

Fig. 1. Enterprise JavaBean container environment.

In this paper, we describe a performance modeling approach that can predict application performance from: an application-independent performance profile of the component container and underlying operating system/hardware and 2. a design description of the application. Our methodology enables an architect to produce numeric values for predicted performance under various workloads. Also, the solution of the performance model leads to insights which can improve the design. The insights can be reflected in answers (see [24], [25]) to questions such as: 1.

What are the maximum load levels the system can handle? If client load increases, what extra hardware is needed to maintain the required performance? 2. What are the average response time, throughput, and resource utilization under the expected workload? 3. Which components have the largest effect on the performance and are they potential bottlenecks? 4. What performance benefit is obtained by choosing a different application architecture? This approach avoids the need for a prototype implementation since we can determine the overall form of the performance equation from the design description. We can then estimate the numeric parameters of the equation by measuring the actual container performance with an application-independent benchmark. This is much simpler in both code and architecture than any useful application, and, importantly, its measurements are reusable for any application that is built on the benchmarked container platform. 1.

2

RELATED WORK

This research fits within the goals of prediction-enabled component technology [19]. Our work is more narrowly focused than [19] as we deal only with performance rather than general quality attributes. However, models for the class of container-hosted applications our work addresses must deal with the interactions between the container and application components. A particular contribution of this paper is our ability to disentangle the influence of the performance characteristics of the container from those of the application’s architecture. This is achieved without the need to inspect or instrument the container code itself.

Our work is also related to analytical performance modeling, based particularly on queuing theory [20], [22], [29], [32]. These techniques have been applied to component-based applications [28]. One approach is to model the complete system at a very detailed, physical level [9]. In contrast, others have worked at the software architecture level [2], [21], [23]. These efforts generally lead to a performance model giving the overall form of the equation. However, a quantitative prediction requires applicationspecific parameters, which can be obtained only from a substantial prototype implementation. Performance model-based techniques can be used at an early stage of design. The software performance engineering community (SPE) has for a number of years advocated the need to integrate design and performance modeling activities [31]. The design method is based on use cases, object modeling, and functional modeling using UML. Many approaches translate architecture designs in UML to analytical models, such as queuing network models (QNM) [5], [33], stochastic Petri nets [4], [27], or stochastic process algebras [7]. A survey on model-based performance prediction approaches and their tool supports can be found in [2]. In these approaches, the application workflow is presented in a sequence or state diagram, and a deployment diagram is used to describe the hardware and software resources. Importantly, the component container, its communication patterns, and performance properties are not explicitly modeled. These approaches therefore generally ignore or greatly simplify the crucial details of the underlying container performance. As a result, the models are rather inaccurate or nonrepresentative. Harkema et al. [16] developed a simulated QNM of CORBA middleware but the work is specific to the threading structure of a CORBA server. An analytical model was developed in [18], addressing the blocking problem of EJBs applications. The model was verified by simulation, and the complexity of the model limits its use in predicting the performance of a real EJB application. Hence, little work has been done to integrate the performance characteristics of component containers into the prediction methods at the design level. Another problem of these approaches is that explicit values for parameters are required to solve the models, such as the CPU time that each operation consumes. However, these performance parameters cannot be accurately estimated during application design. A common practice therefore is to build a prototype and use this

930


to obtain measures for the values of parameters in the model. For a complex application, this is expensive and time-consuming. Another thread of research has been aimed at more qualitative prediction [1]. This is based on detailed measurements of a benchmark application, but the outcome is insight into aspects of a design with performance consequences [1], [13]. Progress has been made to reduce the prototyping effort with tool support for automatic generation of test beds [6], [8]. Although prototype testing can produce empirical evidence of the suitability of a platform and/or an architecture design, it is inherently inefficient in predicting performance as the application architecture inevitably evolves. Under change, the test bed has to be regenerated and redeployed, and the measurement has to be repeated. A preliminary version of this paper appeared as [24]. Several other case studies and more details are given in [25].

3

1.

2.

THE PERFORMANCE PREDICTION METHODOLOGY

3.1 Overview A performance prediction approach for component-based application needs to encompass the following aspects. First, the performance model should explicitly represent the container components and processes that interact with application components and the database where persistent state resides. This includes: the handler that receives and dispatches requests to their required services, and associated activities, including initializing a component instance, . the container mechanism for activation and passivation of components, and . the database activity. These elements must be modeled because contention for resources causes bottlenecks in the container, while the execution time of the application components is often negligible. Second, architectural patterns supported by the container implementation depend on the settings of the container and application components. For example, when compared to a stateful component, a stateless component requires less processing time from a container to manage its lifecycle. These different settings affect the behavior of the container, and the service time of a request varies accordingly. Therefore, the service time of a request should be modeled as a function of those performance parameters affected by the settings of attributes of interest. Third, an application-independent performance profile of the container is required. The container and the operating system/hardware platform must be taken into account to be able to make accurate performance prediction. As the platform characteristics must be determined at design time before we have implemented the application, we need to measure these characteristics in an application-independent way, through a benchmarking activity. To this end, our approach combines performance modeling and benchmarking. We follow five phases in our methodology, each producing a distinct artifact: .

3.

4.

VOL. 31,

NO. 11,

NOVEMBER 2005

The Queuing Model. We first create a Queuing Network Model for a component container. This requires us to identify the main components of the container, and note where queuing delays occur. We abstract details of the container components and their communication, resulting in a model with a simple structure. Standard queuing theory techniques enable us to obtain a performance prediction from the model, once we determine the service demand distribution characteristics of the queues in the model. Section 3.2 shows the queuing model for an EJB container. In [26], we describe a model for JMS technology. Architectural Pattern. In this phase, we consider the processing steps taken by the application during the processing of a single request. Choosing an architecture involves deciding on a set of components, their container attribute settings, and their communication pattern [15]. All of these elements have an impact on the container resources and service demand. A given architectural pattern is described by an activity diagram that captures the steps performed in processing a request. This can then be converted into an expression for service demand placed on the components in the architecture for each queue in the system. Section 3.3 demonstrates this conversion. In [25], we describe a similar analysis for a pattern based on optimistic concurrency control. Parameterized Performance Model for the Application Design. For a given application design, we can determine the demand the components in the design place on the container resources. This will depend on the application usage profile and business logic, which tell us how often methods are called, and what operations are performed by each method. By using the pattern analysis appropriate to the architectural design, we obtain a formula expressing the overall service demand for each queue in terms of the characteristics of the platform (such as the time to initialize a component). In Section 3.4, we present this calculation, and illustrate this for two different designs for Stock-Online. Platform Performance Profile. The above produces a performance prediction for the designed system in the form of an equation with parameters relating to the container infrastructure. Some of the parameters represent tunable features of the container’s configuration such as thread pool size, but others reflect internal hidden implementation details of the container. We therefore implement a simple application with minimal business logic on the target container, and measure its performance. To obtain the values of some nonmeasurable parameters, we perform steps 1-3 above for the benchmark application itself, obtaining a parameterized performance model for it and predicting performance in terms of the container’s characteristics. Since we know the measured performance of the benchmark application on the container, and values for some of the parameters, we solve the


Fig. 2. A QNM of an EJB server.

5.

performance model corresponding to the simple benchmark application and determine values for the missing parameters. The result is the performance profile of the container running on the specific operating system/hardware platform. This step is explained in Section 3.5. Quantitative Performance Prediction. The performance profile of the container provides concrete values which we can use for the parameters in the performance model of the application design. Thus, we have a queuing model with explicit service demands for each queue, representing the designed application running on the profiled container. This queuing model can be solved using standard queuing theory to provide a quantitative prediction for the application performance. This is discussed in Section 3.6.

3.2 Container Queuing Model A performance model for a component-based design should capture the container behavior when processing each request from a client. This is because the dominant impact on performance in many applications comes from the delays and costs associated with the container’s activity, rather than from executing the business logic. For this reason, we focus on modeling the behavior of the container in processing method invocations on the components it hosts. As containers can process multiple simultaneous requests, the threading model utilized must also be represented. The QNM in Fig. 2 models the main components and their interactions in a single EJB container. The model comprises a closed QNM, as the EJB component model utilizes synchronous communication protocols, with the EJB client blocking until the response to its request arrives. A closed QNM is appropriate for components using synchronous communication, as containers employ a finite thread pool that limits the maximum requests active in the server. In this model, application clients represent the “proxy” client1 of the EJB container. A client is considered as a delay resource and its service time equals the thinking time between two successive requests. A request to the EJB container is interpreted and dispatched to an active container thread by the request handler. The request handler is modeled as a single server queue with no-load 1. As opposed to clients driven by human interaction, proxy clients such as servlets continually handle requests that arrive at a Web server and do not “think” between requests.

931

dependency because although the request handler may be multithreaded, the number of threads allocated is controlled by the container’s internal implementation to achieve optimal performance and is not available for configuration. It is annotated as Request Queue in the QNM. The container is multithreaded, and threads are pooled to minimize the cost of creation. The size of the thread pool is configurable. Therefore, the container is modeled as a multiserver queue with m servers without load dependency, where m is the thread pool size. It is annotated as Container queue in the QNM. The database clients are the EJBs that handle the request. EJB containers provide a database connection pooling mechanism to facilitate connection reuse across requests. Database access is therefore modeled as a delay server with load dependency, annotated as the DataSource queue. The maximum number of connections is k, which sets the upper bound of active connections in the database at runtime. We can determine the value of this from other information, by k ¼ minðdatabase connection pool size; EJB server thread pool sizeÞ: The operation time at the database tier contributes to the service demand of the DataSource server. If there is no free database connection available for a request, the request is put into a queue waiting for a free database connection. Once we have explicit values for service demands at each queue, this general QNM becomes explicit, and can be solved using Mean Value Analysis (MVA) for closed QNMs [28] (p. 212). The data we need to solve the model are: N: Average requests/client population in the QNM. m: size of the EJB thread pool. Recall that the number of database connections k is determined from m. . Di : Service demand of a single queue server, i.e., the amount of time required for a request to be served at queue i. In this paper, we use subscripts to denote the three queues: subscript 1 refers to the Request queue, subscript 2 to the Container queue, and subscript 3 to the DataSource queue. The performance metrics of interests which can be obtained from the QNM solution include: . .

X0 : Average throughput of the queuing network. R: Average response time, equal to the total residence time over all queues, R ¼ N=X0 . . ui : Average utilization of queue i, defined as the fraction of time the resources of queue i are busy. The implementation of an EJB container is complex and vendor specific. This makes it difficult to develop a performance model that covers all the relevant implementation—dependent features, as the EJB container source code is usually not available. For this reason, our quantitative model only covers the factors that impact the application performance, and ignores factors that are less performance sensitive. . .

3.3 Architectural Pattern Analysis In order to turn the general QNM into an explicit one that can be solved for the desired performance metrics, we need

932


to determine the service demand at each queue. As a preliminary step, we analyze the service demand produced by the processing of a single request for one EJB. The EJB model uses entity bean components to encapsulate data returned from databases. There are a few, wellknown alternative architectural design patterns using entity beans which are frequently used in different application designs. Each alternative impacts differently the behavior of the EJB container and its interactions with other application components and, thus, affects the service time of each server resource in the QNM. We therefore produce an analysis of the demand placed on each service queue by an entity bean within a single request when a particular design pattern is used. We refer to the service demand of an architecture pattern A as f A (which will be subscripted by indexes 1, 2, or 3 for the relevant queue: Request, Container, or DataSource). This forms part of the overall determination of the explicit service demand. Furthermore, the analysis is general and, thus, reusable across any applications that use a given design pattern. This approach requires us to produce an activity diagram, showing the sequence in which steps execute and, especially, the decision points that lead to different sequences in different situations. When activities proceed in sequence, we determine the total service demand by adding the demands in each activity. When there are different paths, we need to understand the probability of each decision at a decision point. We can determine the total service demand as a weighted average of the demand in each path, with weights corresponding to the probability of taking that path. (When the choice between paths is not probabilistic, but rather reflects a factor known to the designer such as whether the request is read-only or not, we instead give two formulae for service demand, corresponding to each decision and using the path appropriate for that decision.) In the analysis of each pattern, it is important to represent only the activities that are necessary to process one EJB’s role in a request and ignore those that are concerned with general initialization or other container housekeeping. These are common overheads associated with the request as a whole and can be factored out of the analysis. These aspects are included later, when we combine the service demands associated with each EJB within each request into a parameterized model for the entire design. Thus, we do not consider the general perrequest overheads of the container, such as initializing a set of container management services, registering a transaction context with the transaction manager, finalizing a transaction, and reclaiming resources. To illustrate this, we now give the detailed analysis for two common architectural design patterns: the standard pattern for Container-Managed Persistence (CMP) and a Read-Mostly pattern [10] that works well for data where reads are frequent, as it exploits the cache’s treatment of read-only entity EJBs. We derive the service demand from an activity diagram whose form is found in the EJB specification of an entity bean lifecycle [11]. The lifecycles for each architectural pattern in this example are:

.

.

VOL. 31,

NO. 11,

NOVEMBER 2005

When data is found through findByPrimaryKey. This is invoked on the component’s home interface. Only one component is identified by the primary key. When one or more data item is found through findByNonPrimaryKey, which is also called on the component’s home interface. A collection of references is returned.

3.3.1 Modeling Container-Managed Persistence (CMP) Fig. 3a shows the activity diagram for the lifecycle of an entity bean found based on its primary key, using the CMP design pattern. Each transaction request is a method of a session bean, which acts as a fac¸ade to the entity beans. There is one entity for each table in the database. A session bean first gets a reference to an entity bean from the database by invoking findByPrimaryKey() in the activity labeled as FindByPK. If the primary key is present in the database, a reference to the entity bean is returned to the session bean. The container then checks its cache to see whether a corresponding bean instance (identified by its primary key) exists. If so, it transitions to activity LC (Load from Cache), in which a cached instance is returned. We define the cache hit ratio as h, so this transition occurs with probability h. Otherwise, the container transitions to the activity labeled A=P (Activate/Passivate). In this activity, ejbActivate(), which incurs expensive serialization operations, is called to associate a pooled instance with a primary key. If the pool is full, ejbPassivate() is called to serialize a victim entity bean instance to secondary storage. Both the LC and A=P activities are followed by activity LD, (Load from Database), in which a call to ejbLoad() synchronizes the state of the entity bean cache with the underlying database. At the end of the lifecycle, we test whether or not any modifications were made; if so, ejbStore() writes to the database the updates made in the transaction. This is represented by activity SD. We now need to determine the service demand in each of the queues, so we separate the activities LC and A=P involving the Container queue from the activities involving the DataSource, (see Figs. 3b and 3c). Consequently, we represent the CMP architecture effect on Container queue as in (1), where TLC and TAP are the service time of activities LC and A=P , respectively. f2cmp ¼ hTLC þ ð1 hÞTAP :

ð1Þ

Similarly, we convert the other diagram into an equation to give the CMP architecture’s demand on the DataSource queue. One complexity here is that the physical organization of the data tables has a significant impact on the service demand of some operations in the database. In particular, the service demand of the FindByPK activity depends on the type of index available to determine the appropriate row. For this example, we will deal with the table organization where a nonclustered index is created on each table by its primary key constraint, and the table is heap organized.2 We use a constant TfindByP K NCI to represent the 2. Other architectural choices for the application might use different physical organizations such as a clustering index, and if so, they would require different constants in the corresponding analysis. These constants could be determined by benchmarking, just as F indByP K NCI is found in our example.


933

Fig. 3. CMP activity diagram using primary key lookup. (a) Overall CMP states. (b) Container queue states. (c) DataSource queue states.

service demand of the FindByPK activity (here “NCI” stands for “nonclustered index”). As well, Tload and Tstore are the service time to find, load data into, and store updates to an entity bean. The formula for service demand in the DataSource queue for a bean where only read-only methods of an entity bean are invoked is shown in (2). If both reads and writes occur, we use (3). f3cmp ¼ TfindByP K f3cmp ¼ TfindByP K

NCI

þ Tload ;

ð2Þ

þ Tload þ Tstore :

ð3Þ

NCI

3.3.2 Modeling the Read-Mostly Pattern The Read-Mostly (RM) pattern is presented in [10]. It is especially appropriate when a high frequency of read-only operations occurs. For RM, each database element is represented by two entity beans, one used for read-only operations, the other for read-write operations. The container is aware of the request type and so accesses the read-only or read-write entity bean cache as appropriate. For a read-only bean, the container goes to state LD only when a cache miss occurs. We denote the cache hit ratio of read-only beans by hr . The container deals with read-write requests in the same manner as the CMP architecture. The activity diagram for this behavior is shown in Fig. 4. Again, we analyze the RM architecture based on the access to an entity’s data. For a read-only entity bean, we have (4) and (5). f2rm ¼ hr TLC þ ð1 hr Þ TAP ; f3rm ¼ TfindByP K

NCI

þ ð1 hr ÞTload :

ð4Þ ð5Þ

For a read-write entity bean, f2rm ¼ f2cmp ¼ h TLC þ ð1 hÞ TAP ; f2rm ¼ f2cmp ¼ TfindByP K

NCI

þ Tload þ Tstore :

ð6Þ ð7Þ

3.3.3 Modeling Find-by-Nonprimary-Key In the analysis above, we assume that the entity bean’s lifecycle starts when the primary key is used to retrieve the information from the database. This is common, but not universal. Data associated with an entity bean can be found without a primary key, using the FindByNonPrimaryKey method. This returns a collection of entity bean references. The container loads all the matched entity beans into the cache when findByNonPrimaryKey is called. This is modeled by activity FindByNonPK in Fig. 5. When getValue is called, the container uses the cached beans. As bean instances are accessed iteratively, we denote the size of the collection as n. Because the collection of matched entity beans share activities with substantial service demand, our diagrams and formulae for service demand are no longer per-bean, but rather for the whole collection. If needed, per-bean formula can be found by dividing the result for the collection by n, thus amortizing service in joint activities over the beans in the collection. As in the PrimaryKey case in Section 3.3.1, the analysis must be for a particular physical organization of the data. In this example, this represents where a nonclustered B-tree index is created on the nonprimary-key column being searched, and the table is heap organized. For this physical organization, it is a reasonable approximation3 that the 3. In more detail, the service demand is actually a step function, increasing whenever an extra block is touched; we approximate this step function by a straight line.

934


VOL. 31,

NO. 11,

NOVEMBER 2005

Fig. 4. RM pattern activity diagram with primary key lookup. (a) Overall RM states. (b) DataSource queue states.

service demand of the index descent is constant and the same as for the descent done in FindByPK, and the overhead of scanning the lowest index blocks for row identities is a linear function of the number of additional rows retrieved. The formulae for a collection accessed by findByNonPrimaryKey are denoted by f2col and f3col . In the CMP architecture pattern with a nonclustered secondary index on the appropriate column, the corresponding formulae are

listed below in (8) and (9), where hc is the cache hit ratio for a collection, n is the number of entity beans in the collection, nw is the number of entity beans updated, and Tscan is the service time of scanning one additional row identity. Other constants have the same values as in previous equations. f2col

cmp

¼ n½hc TLC þ ð1 hc ÞTAP ;

Fig. 5. Activity diagram for findByNonPrimaryKey transactions. (a) Overall state. (b) Container queue states. (c) DataSource queue states.

ð8Þ


f2col

cmp

¼ TfindByP K

NCI

þ ðn 1ÞTscan

þ nð1 hc ÞTload þ nw Tstore :

ð9Þ

935

TABLE 1 Two Stock-Online Business Models

In other data organizations, very different formulae would be appropriate. For example, if no index exists, then FindByNonPK will require a complete scan of the database table, with cost proportional to the size of the table. In the RM architecture, we modify Fig. 4 by replacing the compound activity FindByPK with FindByNonPK, and we then apply n iterations for the bean processing. Again, n is the number of entity beans in the collection. The equations below represent a collection entirely of read-only entity beans, where hrc is the cache hit ratio for such a collection of read-only beans: f2col

rm

¼ n½hrc TLC þ ð1 hrc ÞTAP ;

ð10Þ

Dr;3 ¼

X

f3 þ minsert Tinsert þ mdelete Tdelete :

ð16Þ

each-item

f2col

rm

¼ TfindByP K

NCI

þ ðn 1ÞTscan þ nð1 hrc ÞTload : ð11Þ

For a collection of read-write entity beans, the following apply: f2col

rm

¼ f2col

cmp

f3col

rm

¼ f3col

cmp

; :

ð12Þ ð13Þ

3.4

Parameterized Performance Model for an Application Design Next, we must produce a parameterized performance model for an application design with multiple transactions simultaneously utilizing several session and entity beans instances. For each business transaction, the necessary behavioral information must be extracted from design descriptions, such as use case and sequence diagrams. We illustrate this below for the Stock-Online case study. The behavioral description must capture the number and type of EJBs participating in each transaction, as well as the component architecture selected. The service demand of a transaction is determined by aggregating: the constant per-request overheads, the sum of the service time spent on operations in the participating bean’s lifecycles (which reflects the architectural pattern, as discussed above), and . the contributions for other operations that depend on the application business logic. For example, creating a new instance of an entity bean and inserting data into the database invokes the create method of an entity bean’s home interface. This generates a unique key for the data. Therefore, for each transaction r, the service demand at the Request, Container, and DataSource queue can be calculated as . .

Dr;1 ¼ Treq ; Dr;2 ¼ TCS þ

X each-item

ð14Þ

f2 þ ncreate Tcreate þ nremove Tremove ; ð15Þ

Here, f2 and f3 are the form of equations in (1), (2), and (3) according to the architectural pattern used in the design. Treq is the per-request demand on the Request queue, and TCS is the per-request demand on the Container’s services (these are each constant for a given platform, independent of the application design). Tcreate (Tremove ) is the service time of an EJB container creating (removing) an entity bean instance, and ncreate (nremove ) is the number of entity bean instances created (removed) through create (remove) method in the home interface. Tinsert (Tdelete ) is the service time to insert (delete) a record into the database through the DataSource Queue server, and minsert (mdelete ) is the number of records inserted (deleted) from the database. For an application design with multiple transaction types, we produce the expressions above for each transaction, and then the overall service demand is given as a weighted average of these expressions, where the weights reflect the frequency of each transaction in the transaction mix.

3.4.1 The Stock-Online Example Stock-Online is a simulation of an online stock-broking system. It models typical e-commerce application functions. It supports six business transactions and enables users to buy and sell stock, inquire about the prices of stocks, and get a holding statement detailing the stocks they currently own. The supporting database has features to track customers, their transactions, and payments. There are four database tables to store details for accounts, stock items, holdings, and transaction history. The transaction mix can be configured to model a lightweight system with readmostly operations, or a heavyweight one with intensive update operations. The frequency of each transaction type in two specific business models for Stock-Online are listed in Table 1. Further details on Stock-Online are in [12]. 3.4.2 Parameterized Performance Model for Stock-Online As described above, we can estimate the service demand on each queue’s server for a design using a representation of

936


VOL. 31,

NO. 11,

NOVEMBER 2005

Fig. 6. Sequence diagram of BuyStock.

the number of beans accessed in each transaction type, and the architectural design pattern used. We demonstrate this below for the BuyStock transaction type (using the CMP and RM architectures), and due to space restrictions simply quote the results for the other transaction types. In this process, we use the concept of a scenario [17]. A scenario traces through the application and can be derived from use cases or class diagrams. It has attributes to specify its name, visit count, type (e.g., read, write, create, and remove), and think time (e.g., idle time between two requests in milliseconds). A scenario comprises multiple calls. A call is the same as a message in sequence diagrams. It has attributes caller, callee, caller scenario, and callee scenario to specify the origin and destination of a call. A call also has other attributes such as the number of calls (or invocation count) in a transaction, the type (e.g., synchronous or asynchronous), and iteration count. If a call is remote, bytes sent and received can be specified to estimate the overhead of the network communication. For example, the BuyStock transaction in Fig. 6 queries an instance of the Account entity by its primary key. Four of the Stock-Online entity beans, Account, StockItem, StockTx, and StockHolding are accessed within the transaction boundary of BuyStock. Account, StockItem, StockHolding, and StockTx can be modeled using findByPrimaryKey, as their instances are obtained by primary key. One, a StockTx entity bean instance is created, and it is used to insert a transaction history record into the database. If we take a design for Stock-Online using the standard session-fac¸ade CMP architectural pattern, we use (1) and (3) for the service demands for the Account, StockHolding, and StockTx beans, and (1) and (2) for the service demand due to the StockItem bean. Next, from the analysis expressed in (14), (15), and (16), we derive the following formula for the service demand on the Container queue for BuyStock: DCMP BuyStock;2 ¼ TCS þ 4 Eqð1Þ þ Tcreate

¼ TCS þ 4 ½hTLC þ ð1 hÞTAP þ Tcreate : We similarly use (4) and (6) to analyze the service demand on the Container queue for BuyStock using the RM architectural pattern, obtaining

DRM BuyStock;2 ¼ TCS þ 2 Eqð4Þ þ 2 Eqð6Þ þ Tcreate :

Using the same method for each transaction type, we obtain the service demand of each transaction for two architecture models in Table 2. TABLE 2 Parameterized Stock-Online Service Demands


937

Fig. 7. The basic benchmark scenario.

The overall parameterized performance model is obtained by taking weighted averages of these formulas, with weights reflecting the transaction frequencies in Table 1 for whichever business model is selected as appropriate.

3.5 Platform Performance Profile 3.5.1 The Benchmark Design and Implementation The key innovation of our modeling method is that we disentangle the impact of properties of the container implementation and the computing platform, from those of the application design. Thus, we can measure the infrastructure in an application-independent situation and use these values for the parameters in a model which is produced at design time. This implies that the benchmark application we use to measure the infrastructure should have the minimum application layer possible while still exercising the component infrastructure in a meaningful and useful way. Component technologies leverage many standard services to support applications. The benchmark scenario is thus designed to exercise the key elements of a component container involved in the application execution. We have designed and implemented a benchmark suite to profile the performance of EJB-based applications on a specified J2EE implementation. The implementation of the benchmark involves a session bean and an entity bean. The benchmark scenario is shown in Fig. 7. The benchmark suite also comprises a workload generator, monitoring utility, and profiling toolkit. We collect performance metrics for the EJB container at runtime, including the number of active threads, active database connections, and the hit ratio of the entity bean cache. A profiling toolkit OptimizeIt is also used. It helps in collecting statistics such as the percentage of time spent on a method invocation. From this, we can estimate the percentage of execution time spent in container services. Profiling tools are necessary for black box COTS component systems, as instrumentation of the source code is not possible.

The benchmark clients simulate requests from proxy applications, such as servlets executing in a Web server. Under heavy workloads, such proxy clients have an ignorable interval between two successive requests. Its population in a steady state is consequently bounded.4 Hence, the benchmark client spawns a fixed number of threads for each test. Each thread submits a new service request immediately after the results are returned from the previous request to the EJB. The “thinking time” of the client is thus effectively zero.

3.5.2 Measurements for One Infrastructure The benchmark suite must be deployed on the same platform (both software and hardware) as the target application. In our case study, the target environment consists of two machines, one for clients, and the other for the container and database server. The hardware and software configuration is listed in Table 3. We refer to this environment as the WLS platform. For the container and database machine, HyperThreading is enabled, effectively making four CPUs available. Two CPUs are allocated for the container and the other two CPUs are allocated for the database server. The container thread pool size is set to 20 and the database connections pool size is also 20, i.e., m ¼ 20 and k ¼ 20 for the QNM in Fig. 2. Each experiment is run several times to achieve confidence that the parameter values obtained have low variability (standard deviation below 3 percent of the measured value). The profiling tools allowed us to obtain direct measurements for many of the parameters for the performance profile of the container and database, such as TCS , TLC , Tcreate , TfindByP K NCI , Tscan , Tload , and Tstore . We also obtain values for the cache hit rates in different situations (h ¼ 0:69, hc ¼ 0:93, and hr ¼ hr c ¼ 1:0). However, one key parameter was not observable by the tools we 4. A Web server has configuration parameters to limit the active workload. For example, Apache uses MaxClient to control the maximum number of workers; thus, the concurrent requests to the application server are bounded.

938


VOL. 31,

NO. 11,

NOVEMBER 2005

TABLE 3 Hardware and Software Configuration

TABLE 4 Parameters from Benchmarking on WLS

had: TAP . To find this, we carried out the complete performance prediction analysis for the benchmark application, giving a formula expressing overall response time in terms of all the parameters. By substituting for the parameters we observed, we were left with a model with only one unknown parameter, which we solved by empirically searching for a parameter value that led to matching the observed output. The performance profile for this WLS-based platform is shown in Table 4. The load-dependent rate of the DataSource queue derived from Table 4 is shown in Fig. 8.

the benchmark application on that platform. This will give us an explicit performance model, where each queue’s server has a known service demand. Standard techniques from queuing theory allow this model to be solved numerically (if not always in closed form).

3.5.3 Alternative Container Performance Profile We also benchmarked another EJB container to produce a second performance profile. In this case, the container implementation was Borland Enterprise Server (BES) 6.5 [3]. We refer to this as the BES platform. The implementation of BES varies immensely from the WLS container implementation. However, we still model the BES container using the QNM in Fig. 2. The CMP and RM EJB architectures implemented using BES can also be mapped to the architecture models we developed in Section 3.3. We deploy the same benchmark suite on a BES container, with hardware/software configuration the same as Table 3. The thread pool size is set to 10 and the database connections pool size5 is also 10, such that m ¼ 10, k ¼ 10 for the QNM in Fig. 2. We carry out similar measurements to obtain parameter values including the cache hit rates of h ¼ 0:759, hc ¼ 1:0, and hr ¼ hrc ¼ 1:0. Note that we do not need to determine new values for those parameters that are measuring the database tier, such as TfindByP K NCI , Tscan , Tload , and Tstore because the same database server is used in this platform as for WLS, and so we simply reuse the values calculated in Section 3.5.2. The resulting performance profile for the BES container is shown in Table 5. The load-dependent rate of the DataSource queue derived from Table 5 is shown in Fig. 9.

4

EVALUATING PERFORMANCE PREDICTIONS

4.1 Performance Predictions in the WLS Platform To verify our approach, two versions of Stock-Online were implemented, one using the CMP architecture and the other one using the RM architecture. Each was deployed and run on the same environment used for benchmarking, as described in Section 3.5.1. The mix of transactions was varied to provide both read-intensive and update-intensive business models (see Section 3.4). As discussed in Section 3.5.1, the clients used for performance measurements simulate requests from proxy applications, such as servlets. One client immediately starts the next transaction upon completing the previous one. The predicted server side response time was then compared to the empirical results. Fig. 10 shows the performance results of two implementations of Stock-Online on the WLS platform. The error bar

3.6 A Quantitative Performance Model We now have all the pieces needed to obtain quantitative performance predictions. For a given design, on a given container, we can take the parameterized performance model for the design and substitute in the parameter values, most of which are found from the platform’s performance profile. The remaining parameters are the cache hit rates. For convenience, we estimate these as being the same as for 5. This is the recommended setting for optimal performance by BES guide, 5* number of CPUs/ number of partitions.

Fig. 8. Derived load-dependent rate of DataSource.


939

TABLE 5 Parameters from Benchmarking on BES

shows standard deviation in each measured quantity. The error of prediction for the read-intensive case is mostly around 5-13 percent and the worst case is about 15 percent. The prediction is that the RM architecture significantly improves performance by about 40-44 percent over CMP. Measurement confirms this prediction. When the business model has twice as many updates, the prediction error ranges between 5-14 percent. It can be seen from the prediction that the advantage of the RM architecture reduces to 30-33 percent as the ratio of read-only transactions decreases. Still, the RM architecture produces better performance than CMP architecture, a fact that can be found in the predictions since the size of the performance advantage is much more than the error in prediction. The model can also help with capacity of planning. Suppose that the requirement for the average response time is less than 0.5 seconds. We can use the performance model to predict that the maximum workload of the CMP design to meet this requirement is N ¼ 310, and this degrades to N ¼ 264 when the system has twice as many updates. The utilization of each server in the QNM is depicted in Fig. 11. The prediction is that, in the CMP architecture, the Container queue server is the most utilized subsystem. It is the bottleneck with utilization approaching 100 percent. The RM architecture optimizes the cache for read-only data, reducing Container queue demand. Thus, the throughput

Fig. 9. Derived load-dependent rate of DataSource.

Fig. 10. Stock-Online Performance on WebLogic Server 8.1.

increases and the utilization of the servers in Request queue and DataSource queue increases. The Container queue remains the bottleneck software component, and this indicates that, to further improve performance, an extra EJB container must be clustered to increase the processing capacity of the EJB tier.

4.2 BES Evaluation This section evaluates a second set of predictions for the performance of Stock-Online with two architectures (Option B and RM) on Borland Enterprise Server (BES) 6.5 [3]. The Borland configuration for Option B is analogous to the CMP architecture. We use the parameters from the performance profiles for BES server in Table 5. In order to verify the predictions, we deployed Stock-Online on BES server and measured its performance. Fig. 12 shows the predictions and results for the two architectures and business models (read-intensive and intensive updates). The error of predictions is within 11 percent. We predict that the RM architecture can improve response time by 26 percent (the measurement shows it is about 33 percent) for the read-only intensive

Fig. 11. Predicted utilization of each server. For both CMP and RM architecture, the figure shows the utilization of each server under two business models, read mostly and doubled updates.

940


VOL. 31,

NO. 11,

NOVEMBER 2005

and message-oriented asynchronous communication. We plan to extend the infrastructure and architecture models, and carry out more case studies on other EJB-based applications. We are also confident that the approach is general, and could be applied to other component technologies such as COM+ and .NET. The current weakness of the approach is a lack of tool support. To make performance prediction practical, it must be supported by an appropriate toolset so that the architect can concentrate on the design and predict the performance easily and accurately. Also, hardware contention is included in the benchmarking. Therefore, a performance profile from a benchmark that does not adequately capture the hardware contention can be misleading. Hence, we are currently working toward the construction of a tool set, incorporating model-based custom benchmark generation, automatic benchmark deployment and measurement on the target platform, reusable result repositories, and population of the performance model with benchmark measurements.

ACKNOWLEDGMENTS Fig. 12. Performance of Stock-Online on BES.

National ICT Australia is funded through the Australian Government’s BackingAustralia’s Ability initiative, in part through the Australian Research Council.

business model over the CMP architecture, and this advantage diminishes to 20 percent (the measurement shows it is 24 percent) when the business model has intensive updates.

REFERENCES

5

CONCLUSIONS

AND

FUTURE WORK

In this paper, we report a significant contribution to area of performance prediction for component-based applications. Our approach supports software architects who need to make early architectural choices during the design phase for a component-based, container-hosted application, in order to achieve desired performance goals. We have shown that one can predict eventual performance quite closely (often within 10 percent on measures for our example systems). Our approach derives a quantitative performance model for the design, with parameters that reflect properties of the component container and platform. These parameters can be measured by running a simple benchmark application on the platform. A major advantage of this approach is that many of its artifacts are reusable. For example, the models we developed can be applied to different EJB containers, and a performance profile can be used to predict performance of different applications executing on the same container. The infrastructure model, the architecture model, and performance profile of the platform are each extensible—they can be developed to capture future changes in container architectures. Thus, this approach is potentially evolvable and suitable for the diversity and flexibility of existing, widely used component technologies. Other J2EE components can dominate performance in more specific application scenarios. These include large data transfers through the application server and database, complex distributed transactions with two phase commits,

[1] [2] [3] [4]

[5] [6]

[7]

[8] [9]

[10] [11] [12] [13]

F. Bachmann, L. Bass, and M. Klein, “Deriving Architectural Tactics,” Technical Report CMU/SEI-2003-TR-004, Carnegie Mellon Univ., 2003. S. Balsamo, A.D. Marco, and P. Inverardi, “Model-Based Performance Prediction in Software Development: A Survey,” IEEE Trans. Software Eng., vol 30, no. 5, pp. 295-310, May 2004. Borland Enterprise Server 6.5 AppServer Edition, http://info. borland.com/techpubs/bes/v65/html_books/index1280x1024. html, 2004. S. Bernardi, S. Donatelli, and J. Merseguer, “From UML Sequence Diagrams And Statecharts To Analysable Petri Nets Models,” Proc. Third Int’l Workshop Software and Performance (WOSP’02), pp. 35-45, 2002. V. Cortellessa and R. Mirandola, “Deriving a Queueing Network Based Performance Model From UML Diagrams,” Proc. Second Int’l Workshop Software and Performance (WOSP’00), pp. 36-55, 2000. Y. Cai, J. Grundy, and J. Hosking, “Experiences Integrating and Scaling a Performance Test Bed Generator with an Open Source CASE Tool,” Proc. IEEE Int’l Conf. Automated Software Eng. (ASE’04), Sept. 2004. C. Canevet, S. Gilmore, J. Hillston, M. Prowse, and P. Stevens, “Performance Modeling With UML And Stochastic Process Algebras,” IEEE Proc. Conf. Computers and Digital Techniques, vol. 150, no. 2, pp. 107-120, 2003. G. Denaro, A. Polin, and W. Emmerich, “Early Performance Testing of Distributed Software Applications,” Proc. Int’l Workshop Software and Performance (WOSP’04), pp. 94-103, Jan. 2004. J.A. Dilley, R.J. Friedrich, T.Y. Jin, and J. Rolia, “Measurement Tools and Modeling Techniques for Evaluating Web Server Performance,” Technical Report HPL-96-161, Hewlett Packard Labs, 1996. D. Rakatine, “The Seppuku Pattern,” http://www.theserverside. com/patterns/thread.tss?thread_id=11280, 2002. Entity Bean Component Contract for Container-Managed Persistence, Entity Bean Enterprise JavaBean Specification Chapter 10, Version 2.1, http://java.sun.com/products/ejb/docs.html, 2003. I. Gorton, Enterprise Transaction Processing Systems. AddisonWesley, 2000. I. Gorton and A. Liu, “Performance Evaluation of Alternative Component Architectures for EJB Applications,” IEEE Internet Computing, vol. 7, no. 3, pp. 18-23, 2003.


[14] I. Gorton, A. Liu, and P. Brebner, “Rigorous Evaluation of COTS Middleware Technology,” Computer, vol. 36, no. 3, pp. 50-55, 2003. [15] I. Gorton and J. Hacck, “Architecting in the Face of Uncertainty: An Experience Report,” Proc. 26th Int’l Conf. Software Eng. (ICSE’04), pp. 543-551, 2004. [16] M. Harkema, B.M.M. Gijsen, R.D. Mei, and Y. Hoekstra, “Middleware Performance: A Quantitative Modeling Approach,” Proc. Int’l Symp. Performance Evaluation of Computer and Comm. Systems, 2004. [17] D. Harel, H. Kugler, R. Marelly, and A. Pnueli, “Smart Play-Out,” Proc. 18th Ann. ACM SIGPLAN Conf. Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA’03), pp. 68-69, 2003. [18] P.G. Harrison and M.L. Catalina, “A New Blocking Problem from Java-Based Schedulers,” Performance Evaluation, vol. 51, pp. 229246, 2003. [19] S. Hissam, G. Moreno, J. Stafford, and K. Wallman, “Packaging Predictable Assembly,” IFIP/ACM Working Conf. Component Deployment, pp. 108-224, 2002. [20] P.A. Jacobson and E.D. Lazowska, “Analyzing Queueing Networks with Simultaneous Resource Possession,” Comm. ACM, vol. 25, no. 2, pp. 142-151, 1982. [21] S. Kounev and A. Buchmann, “Performance Modeling of Distributed E-Business Applications Using Queuing Petri Nets,” Proc. IEEE Int’l Symp. Performance Analysis of Systems and Software, 2003. [22] E. Lazowska, J. Zahorjan, S. Graham, and K. Sevcik, Quantitative System Performance. Prentice Hall, 1984 [23] T.K. Liu, A. Behroozi, and S. Kumaran, “A Performance Model for a Business Process Integration Middleware,” IEEE Int’l Conf. E-Commerce, pp. 191-198, 2003. [24] Y. Liu, A. Fekete, and I. Gorton, “Predicting the Performance of Middleware-Based Applications at the Design Level,” Proc. Int’l Workshop Performance and Software Eng. (WOSP’04), pp. 166-170, 2004. [25] Y. Liu, “A Framework to Predict the Performance of ComponentBased Applications,” PhD Thesis, Univ. of Sydney, Australia, 2004. [26] Y. Liu and I. Gorton, “Performance Prediction of J2EE Applications using Messaging Protocols,” Proc. Int’l SIGSOFT Symp. Component-Based Software Eng. (CBSE’05), pp. 1-16, 2005. [27] J.P. Lo´pez-Grao, J. Merseguer, and J. Campos, “From UML Activity Diagrams to Stochastic Petri Nets: Application to Software Performance Engineering,” Proc. Fourth Int’l Workshop Software and Performance (WOSP’04), pp. 25-36, 2004. [28] D. Menasce´ and V.A.F. Almeida, Scaling for E-Business: Technologies, Models, Performance, and Capacity Planning. Prentice-Hall, 2000. [29] J.A. Rolia and K.C. Sevik, “The Method of Layers,” IEEE Trans. Software Eng., vol. 21, no. 8, pp. 689-700, Aug. 1995. [30] J. Skene and W. Emmerich, “A Model-Driven Approach to NonFunctional Analysis of Software Architectures,” Proc. IEEE Int’l Conf. Automated Software Eng. (ASE’03) pp. 236-239, 2003. [31] C.U. Smith and L.G. Williams, “PASASM : A Method for the Performance Assessment of Software Architectures,” Proc. Third Int’l Workshop Software and Performance (WOSP’02) pp. 179-189, 2002. [32] C.M. Woodside, J.E. Neilson, D.C. Petriu, and S. Majumdar, “The Stochastic Rendezvous Network Model for Performa of Synchronous Client-Server-Like Distributed Software,” IEEE Trans. Computers, vol. 44, no. 1, pp. 20-34, Jan. 1995. [33] J. Xu, C.M. Woodside, and D. Petriu, “Performance Analysis of a Software Design Using the UML Profile for Schedulability, Performance and Time,” Proc. Computer Performance Evaluation, Modelling Techniques and Tools (TOOLS’03) pp. 291-310, 2003.

941

Yan Liu received the PhD degree in computer science from the University of Sydney, Austraia, in 2004. She is a researcher at National ICT Australia. Her main research interests are in the areas of performance analysis and evaluation of large component-based systems. She is a member of the IEEE and the IEEE Computer Society.

Alan Fekete received the PhD degree in the mathematics department of Harvard University. He is an associate professor in the School of Information Technologies at the University of Sydney. Most of his research lies in the application of methods from theoretical computer science to software systems. His recent work includes ways to ensure consistency in serviceoriented distributed systems and in transactional applications that use weak isolation mechanisms within the scope of a database management system. He also is active in the computer science education community. He is a member of the IEEE Computer Society. Ian Gorton received the PhD degree in computer science from Sheffield Hallam University. He is a senior researcher at National ICT Australia. Until March 2004, he was chief architect of information sciences and engineering at the US Department of Energy’s Pacific Northwest National Laboratory. His interests include software architectures, particularly those for large-scale, high-performance information systems. He is a member of the IEEE and the IEEE Computer Society. . For more information on this or any other computing topic, please visit our Digital Library at www.computer.org/publications/dlib.

Design-Level Performance Prediction of Component-Based Applications

Design-Level Performance Prediction of Component-Based Applications

Suggest Documents

Performance Prediction of Large Parallel Applications ... - CiteSeerX

Performance Prediction of Component-based Applications - CiteSeerX

Performance Prediction of Distributed Applications ... - LAAS-CNRS

Performance Prediction of Service-Oriented Applications based on an

Performance Prediction of J2EE Applications using ... - Semantic Scholar

Performance Modeling and Prediction for Scientific Java Applications

Performance Prediction and Scheduling for Parallel Applications on ...

Performance Prediction of Electric Discharge

Neuropsychologic Prediction of Performance ...

Sailing boat performance prediction

Performance Prediction Methods - CiteSeerX

Downburst Prediction Applications of Meteorological ...

Computer applications for prediction of protein

Performance Tuning of Scientific Applications

Fabrication, performance and applications of

Computer applications for prediction of protein

Prediction of imaging performance of immersion ...

Prediction of Separation Performance of Dry High

Prediction performance of compressive strength of cementitious

Prediction of coefficient of performance and

Performance Property Prediction Supporting ...

PARALLEL PPI PREDICTION PERFORMANCE ...

Communications software performance prediction - CiteSeerX

Performance Prediction and Evaluation Tools