SOFTWARE – PRACTICE AND EXPERIENCE Softw. Pract. Exper. 2015; 45:613–632 Published online 23 December 2013 in Wiley Online Library (wileyonlinelibrary.com). DOI: 10.1002/spe.2236
A self-scalable load injection service Alain Tchana1, * ,† , Noel De Palma1 , Bruno Dillenseger2 and Xavier Etchevers2 1 University
of Joseph Fourier, (LIG) - 621, avenue Centrale Saint-Martin-d’Heres, BP 53, 38041 Grenoble cedex 9, France 2 Orange Labs, Grenoble - 28, Chemin du Vieux Chłne, BP 98, F-38243 Meylan cedex, France
SUMMARY Load testing of applications is an important and costly activity for software provider companies. Classical solutions are very difficult to set up statically, and their cost is prohibitive in terms of both human and hardware resources. Virtualized cloud computing platforms provide new opportunities for stressing an application’s scalability, by providing a large range of flexible and less expensive (pay-per-use model) computation units. On the basis of these advantages, load testing solutions could be provided on demand in the cloud. This paper describes a Benchmark-as-a-Service solution that automatically scales the load injection platform and facilitates its setup according to load profiles. Our approach is based on: (i) virtualization of the benchmarking platform to create self-scaling injectors; (ii) online calibration to characterize the injector’s capacity and impact on the benched application; and (iii) a provisioning solution to appropriately scale the load injection platform ahead of time. We also report experiments on a benchmark illustrating the benefits of this system in terms of cost and resource reductions. Copyright © 2013 John Wiley & Sons, Ltd. Received 12 August 2012; Revised 26 September 2013; Accepted 3 October 2013 KEY WORDS:
benchmarking as a service; cloud; resource allocation
1. INTRODUCTION Load testing [1–4] applications have always been a crucial yet expensive activity for Internet companies. Traditionally, load testing leverages a load injection platform that generates traffic on the basis of pre-defined load profiles to stress an application (a system under test or SUT for short) to its limits [5–7]. These solutions are very difficult to set up statically, and their costs can be prohibitive in terms of human and hardware resources. Cloud computing provides new opportunities and challenges for testing application scalability because it makes it possible to automatically deliver information technology (IT) resources and services on a per-demand, self-service basis over the network. One characteristic of cloud computing is its high degree of automation for provisioning and its on-demand management of IT resources (computation, storage, and network resources) and services. IT resources can be provisioned in a matter of minutes in the cloud, rather than in days or weeks in a classical setup. Opportunities lie in the fact that load testing solutions can be provided on demand as a service on the cloud. This type of Benchmark-as-a-Service (BaaS) solution provides many benefits in terms of cost and resources. The cost of hardware, software, and tools is charged on the basis of usage. In addition, the platform for the tests is much simpler to set up, so that the testers can focus on their load injection campaign. The challenge of BaaS solutions is to provide test teams with on-demand computing and networking resources, capable of generating traffic on an SUT (which can be deployed and configured on a different cloud long before initiating BaaS). This type of test campaign typically *Correspondence to: Alain Tchana, University of Joseph Fourier, (LIG) - 621, avenue Centrale Saint-Martin-d’Heres, BP 53, 38041 Grenoble cedex 9, France. † E-mail:
[email protected] Copyright © 2013 John Wiley & Sons, Ltd.
614
A. TCHANA ET AL.
Figure 1. Broad picture of a load testing infrastructure.
requires more than one load injection machine to generate sufficient traffic (Figure 1). The difficulty is that the number of load injection machines necessary cannot be predicted as it depends on the amount of resources consumed by generating and managing requests and their responses, as well as on the target global workload. The tester must empirically cope with the following risks: Overloading the load injectors, which may lead to scenarios not behaving as specified, and
biased measurements. Wasting unnecessary resources.
For these reasons, self-scalable load injection software is necessary to make it possible to adjust the number of load injection machines during testing. This paper specifically addresses this challenge: We describe a BaaS platform (BaaSP) that automatically scales the load injection platform. We also report experiments on the RUBiS [8] and Java Message Service (JMS) messaging (Joram [9]) benchmarks illustrating the benefits in terms of selfscalability, including cost reduction for lengthy campaigns. In addition to re-engineering a load injection tool to enable self-scalability, the main advantages of this BaaSP are: (i) online injector calibration; (ii) computation from the load profile and injector characterization to determine the right number of virtual machines (VMs); and (iii) control VM provisioning sufficiently ahead of time. The remainder of this paper is structured as follows: Section 2.1 presents an overview of load injection, Section 2.2 presents the CLIF load injection framework, Sections 3 and 4 describe all the aspects of our auto-scaling protocol, Section 5 deals with the evaluation of this work, and Section 6 and 7, respectively, present related work and conclude this paper. 2. BACKGROUND 2.1. Load injection overview 2.1.1. Overview. Performance testing aims to observe the behavior of a given SUT under specific workload conditions. Observations include concerns relating to quality of experience, that is, userperceived quality, such as response time, availability, and reliability. In addition, energy, computing, storage and networking usage, the number of users simultaneously served, request throughput, or even SUT crash are assessed. Through testing, the provider can define the optimal sizing and tuning of the SUT, in order to deliver services with sufficient quality of experience to the expected number of users, at the same time minimizing infrastructure costs. Unlike performance evaluation techniques based on modeling, mathematical analysis, and simulation of performances, load injection consists in actually generating real requests and submitting them to the system, and measuring the responses. Rajendra Jain [10] summarizes all these approaches. Several research work in this topic use these approaches. For instance, [1] describes an automated test case generation to stress multimedia systems. In the same vein, [2] and [11] describe respectively a prototype for embedded systems and e-commerce applications. Our work involves performance testing based on load injection, as illustrated in Figure 1. Load injectors generate requests, attempting to emulate real user sessions, through the common concept of the virtual user (vUser). In fact, load testing generally requires (sufficiently) realistic workloads to be generated if meaningful results are to be provided. Because generating requests consumes significant computing resources, several load injectors are often necessary to provide the required workload complexity and scale. Copyright © 2013 John Wiley & Sons, Ltd.
Softw. Pract. Exper. 2015; 45:613–632 DOI: 10.1002/spe
SELF-SCALABLE LOAD INJECTION SERVICE
615
As a complement to load injectors, probes make it possible to observe the amount of resources consumed by the SUT and by the load injection system. Observing resource usage by SUTs is interesting for analysis and can be used for troubleshooting and tuning purposes. Observing resource usage by load injectors is also important, as resource shortage on the load injection side is very likely to result in biased response time measurements, and the workload generated will be lower than expected. Common probes measure system load (CPU, memory, disk, and network adaptors), but other probes may be useful to observe usage of specific equipment or software elements (database, web server, network equipment, etc.). Load injectors and probes are commonly bound to a supervision element, providing a centralized point of control and monitoring. 2.1.2. Practice and limits. As for any testing activity, performance testing involves a two-edged trade-off: testing time versus time to market and testing costs versus return-on-investment (with regard to costs of fixing bugs). The first practical limit is that setting up a load testing infrastructure is both costly and time consuming. Moreover, for management purposes, it is not readily shared between different projects, and consequently, it remains inactive most of the time, from one test campaign to another. Hence the interest in cloud technologies to run load injection, even though the infrastructure cannot be controlled or shared. These are serious issues when it comes to analyzing test results. However, an on-demand, ready-to-use load testing infrastructure available through a BaaS portal would be very promising because it makes it feasible to generalize cheap and quick load testing. In load testing, the infrastructure is not the only source of expense. Defining and running workloads also contribute significantly to test campaigns costs. Besides the burden of defining realistic vUsers, setting the workload level, for example, in terms of number of active vUsers, is typically done empirically, with several iterations leading to considerable time consumption. Selfbench [12] have proposed self-regulated load injection, driven by live performance measurements and monitoring of the SUT’s resource usage. This system autonomously manages the number of active vUsers but cannot change the number of load injectors vUsers are dispatched on. Thus, elasticity in terms of number of load injection nodes is also a good reason to develop a cloud-based approach to load testing, because it would save the burden of empirically sizing the load injection infrastructure. 2.2. The CLIF load injection framework This work was performed using the CLIF load injection framework. CLIF is a versatile load testing, open source software [13]. It is both generic and extensible, in terms of target SUT protocols and resources to monitor. A workload scenario combines the definition of one or more vUsers and their behavior, while specifying the number of active vUsers over time. This is called the load profile. A behavior is basically a sequence of requests separated by think times (i.e., periods of inactivity); this sequence can be enriched by adding conditional and loop statements, as well as probabilistic branches. Behavior descriptions make use of plug-ins to support a variety of features, mainly injection protocols (Hypertext Transfer Protocol, FTP, Session Initiation Protocol, etc.) and external data provisioning to make request parameters variable. As described in detail in [14], CLIF’s architecture is based on the Fractal component model [15]. Figure 2 shows the main components of the CLIF architecture. Load injector and probe components are distributed throughout the network. Injectors are responsible for generating the workload and measuring response times, while probes measure consumption of arbitrary resources: CPU, memory, network adapter or equipment, database, middleware, and so on. Load injectors and probes are bound to a central control and monitoring component, namely, the Supervisor, and a central Storage component which collects all measurements upon test completion. All these components are contained in the top-level, distributed CLIF application (ClifApp) composite component. When running a load test, an initial ClifApp is first created containing no probe and no injector. Then, the necessary injectors and probes are deployed on local or remote hosts. CLIF provides a network service, the code server, to deliver JAVA classes and resource files that may be needed Copyright © 2013 John Wiley & Sons, Ltd.
Softw. Pract. Exper. 2015; 45:613–632 DOI: 10.1002/spe
616
A. TCHANA ET AL.
Figure 2. Main components of the CLIF architecture.
by injectors and probes (e.g., scenario files, data sets, probe configuration files, etc.). Injectors and probes are added to the ClifApp component, and their interfaces are bound to the Supervisor and Storage components’ interfaces.
3. BAAS OVERVIEW This section describes the main components and design principles of our self-scaling benchmarking platform (based on CLIF). The main purpose of this BaaSP is to minimize the cost of performing tests in a cloud environment. This cost mainly depends on the number of VMs used and their uptime throughout the test. Because each CLIF injector runs on a separate VM, BaaSP offers a testing protocol that attempts to reduce both the number of VMs used and their execution time. This protocol relies on dynamic addition/removal of CLIF injectors according to variations in the submitted load profile. Roughly, instead of statically setting an oversized number of VM injectors, the BaaSP dynamically adds or removes injectors during the test in line with the workload. In addition, the BaaSP attempts to use each injector at its maximum capacity before adding another one. We will now present the self-scaling protocol implemented in the BaaSP: 1. Virtual machine allocation and systems deployment in the cloud. The first step involves deployment and configuration of the CLIF benchmarking system, possibly including the system to test (the SUT). This is optional as the SUT can be deployed and configured with another deployment system long before benchmarking. The allocation and deployment phase includes VM allocation and instantiation in the cloud. Note that the cloud platform that runs the BaaSP and the SUT can be different. 2. Calibration and planning. The calibration phase aims to determine the maximum capacity of an injector VM. This information will then be used to plan when to add/remove injectors during the test. 3. Testing and injector provisioning. The actual test starts with a minimal number of injectors. The execution of these (request injection) should follow the load profile defined. The BaaSP adds/removes injector VMs according to the plan developed in the previous stage. 4. System undeployment and VM deallocation. This phase is the reverse of step one. At the end of the test, the BaaSP automatically undeploys and frees all the VMs it has instantiated in the cloud. Figure 3 presents the architecture of the BaaSP and an overview of how it works. Compared with the architecture of CLIF (Figure 2 presented in Section 2.2), Figure 3 presents the additional components we have added to CLIF to implement the self-scaling and capacity planning features: the Calibrator and the Planner. Our architecture is organized as follows: a VM (called BaaSPCore) is responsible for orchestrating the test and initiating each test phase (deployment, calibration, test launching, and undeployment); the Calibrator component evaluates the capacity of an injector; and the Planner decides when to add/remove injectors VM during benchmarking. Copyright © 2013 John Wiley & Sons, Ltd.
Softw. Pract. Exper. 2015; 45:613–632 DOI: 10.1002/spe
SELF-SCALABLE LOAD INJECTION SERVICE
617
Figure 3. (top) Self-scaling Benchmark-as-a-Service architecture and (bottom) how it works.
4. SELF-SCALING PROTOCOL This section details the self-scaling protocol. First of all, we describe the calibration mechanism, followed by the planning capacity. This section ends by describing the deployment process, including the injector provisioning protocol. 4.1. Calibrating with CLIF selfbench This phase aims to evaluate the load injection capacity of an injector VM in terms of the maximum number of clients it can emulate (vUsers). To evaluate an injector’s capacity, the Calibrator uses a CLIF extension module called Selfbench [12]. Selfbench is the fruit of research into automating the modeling of black box performance [12]. Part of this work consists of a self-driven ramp-up of the workload, which attempts to determine the maximum number of vUsers a SUT can serve before its resources are considered insufficient. Because Selfbench makes no assumptions about the SUT’s capacity, it starts with a single vUser. From the response times and throughput the load injector gets, Selfbench computes the SUT’s theoretical maximum capacity, while making minimal assumptions about its parallel processing capacity (single threaded). Then, Selfbench increases the number of vUsers stepwise, until either the theoretical maximum capacity or the SUT saturation limit is reached. The number of steps is defined as a parameter. If the SUT becomes saturated, this indicates that the maximum number of vUsers it can serve has been reached. Otherwise, Selfbench makes a more optimistic assumption about the SUT’s capacity, with more extensive parallel processing and runs a new stepwise workload increase. Determining the duration of steps combines theoretical results from queue modeling (for an introduction, see [10]) with statistical considerations about the number of samples and their stability. Step duration is equal to M/(F*T), where M is the max load supported by the VM during the previous step, T is the mean request throughput for each vUser, and F is a fineness factor parameter. Larger values for F give more accurate results but require longer calibration. When response times rise, the Copyright © 2013 John Wiley & Sons, Ltd.
Softw. Pract. Exper. 2015; 45:613–632 DOI: 10.1002/spe
618
A. TCHANA ET AL.
sleeping time for vUsers’ threads increases, occupying less CPU time. Non-optimal calibration is preferable to over-optimistic calibration, which may result in test failure. SUT saturation is defined as the maximum or minimum thresholds on several load metrics, such as CPU usage, free memory, or any other resource consumption that a CLIF probe can monitor. For the work presented here, we use Selfbench in a slightly different way. The injector VM calibration is not based on detecting SUT saturation but on injector VM saturation. Thus, the CLIF probes must be deployed along with the injector VM rather than on the SUT side (nevertheless, SUT saturation should be checked for). At the end of execution, Selfbench gives the number of vUsers reached before injector VM saturation.
4.2. Planning Assuming that all injector VMs in the BaaSP have the same resources, all injectors will have the same capacity (called InjMaxCapacity hereafter) as assessed by the Calibrator. On the basis of this assumption and the time required to deploy injector VMs, the Planner can then plan injector provisioning ahead of time for the selected load profile. Let TTSVM be the deployment time function, which gives the deployment time needed to start a given number of VMs in the cloud. This function is given by the BaasP’s operator and depends on the cloud infrastructure used (in our case, we established our private cloud to configure this parameter as reported in Section 4.3). The load profile (W) can be expressed as a discrete function of the number of vUsers over time: vUsers = W(t) means that the load profile requires ‘vUsers’ to emulate the required workload at time t. Thus, the planning process can be expressed as a function: f(W,InjMaxCapacity,TTSVM). The Planner parses the load profile (W) and produces a provisioning rate (taking the deployment time, TTSVM, into account), in line with the capacity of an injector VM (InjMaxCapacity). This provisioning function is described in Algorithm 1, which returns a hash table VMAt (key and value) where each ‘key’ represents a time when the Planner should add/remove injectors. Table I shows a step-by-step execution of this algorithm (without realistic values). The load profile in this illustration is pyramidal (ramp-up phase, followed by a ramp-down phase) and runs for 20 s. The InjMaxCapacity parameter is 10 vUsers, while the TTSVM function indicates that the cloud platform takes 3 s to simultaneously start between 1 and 10 VMs. With these parameters, the algorithm plans to start one injector at the beginning of the test campaign (VMAt [0]) and an additional injector at time 3. This second injector will actually start to send requests at time 6 and will be switched off at time 15. Note that the first injector saturates at time 6, which corresponds to the start time of the second injector. Another role of the Planner is to prepare the load profile to be executed by injectors added during benchmarking. Algorithm 4 is used by the Planner to generate these load profiles. If I nj _M ax represents the maximum number of injectors that can be simultaneously used during the test, the Planner generates I nj _M ax load profiles: Wi , 1 6 i 6 I njM ax . The purpose of Algorithm 4 is to generate all Wi . Then, when an injector is added, it is configured to use a particular Wi . How a load profile is assigned to an injector is given in the algorithm. During the test, running injectors are ordered from 1 to CurrentNbInj, where CurrentNbInj is the current number of injectors. Thus, each injector i, 1 6 i 6 C urrentN bI nj , runs load profile Wi . When the Planner wants to add nbAdd injectors, it sorts them from C urrentN bI nj C 1 to C urrentN bI nj C nbAdd . Each new injector j, C urrentN bI nj C 1 6 j 6 C urrentN bI nj C nbAdd , will run load profile Wj . Table II shows the execution of this second algorithm. It uses the same input as Table I. According to the maximum number of injectors determined by Algorithm 1, two load profiles are generated for this example; the first injector will run W1 , while the second will run W2 . Notice that the first and the last entries of W2 coincide to the start and end time of the second injector, respectively (as planned by Algorithm 4). Finally, the Planner is implemented as a control loop. If ‘Timers’ represent the set of keys (which are time units) of the hash table VMAt , then the Planner wakes up at each element in "Timers" and adjusts the number of injectors. The first VMAt (VMAt [0]) entry represents the initial number of injectors, deployed before the beginning of the benchmarking process. Copyright © 2013 John Wiley & Sons, Ltd.
Softw. Pract. Exper. 2015; 45:613–632 DOI: 10.1002/spe
SELF-SCALABLE LOAD INJECTION SERVICE
619
4.3. Dynamic injector provisioning There are two main difficulties in providing for the implementation of dynamic injectors. First, because the original load profile is given as a unique function, we have to express the workload function to be run by each new injector. Let W be the original workload, nbInj the current number of injectors, and Wi the workload function of injector i. The following constraint must be respected P nj at all times, t: W(t) D nbI Wi .t /. Generation of all Wi , satisfying the previous constraint, is i D1 done by the Planner through Algorithm 4. The next two sections describe the protocols implemented to achieve dynamic addition/removal of injector VMs. 4.3.1. Injector addition protocol. The Planner initiates the addition of new injectors. The injector addition protocol we implement is as follows: Copyright © 2013 John Wiley & Sons, Ltd.
Softw. Pract. Exper. 2015; 45:613–632 DOI: 10.1002/spe
620
A. TCHANA ET AL.
Table I. Execution of Algorithm 1: an illustration of planned injector VM provisioning.
1. The Planner asks the Deployer to create a number of new injector VMs that are required at a given time in the existing environment. This request contains the load profile that the injectors will run. 2. The Deployer asks the Infrastructure as a Service (IaaS) to start each new VM required in parallel. 3. Each new VM is equipped with a deployment agent that informs the Deployer that it is running. 4. The Deployer sends its configuration to each new injector, including the load profile it will run. 5. Each new injector registers its configuration by contacting the ClifApp. 6. ClifApp integrates the received injector configuration in its injector list and forwards this configuration to its inner component (supervisor etc.). After this, the ClifApp asks the added injectors to start load injection according to their assigned profile. Copyright © 2013 John Wiley & Sons, Ltd.
Softw. Pract. Exper. 2015; 45:613–632 DOI: 10.1002/spe
SELF-SCALABLE LOAD INJECTION SERVICE
621
Table II. Execution of Algorithm 4: an illustration of planning injector load profiles.
Copyright © 2013 John Wiley & Sons, Ltd.
Softw. Pract. Exper. 2015; 45:613–632 DOI: 10.1002/spe
622
A. TCHANA ET AL.
4.3.2. Injector removal protocol. As with the previous protocol, the Planner initiates injector removal. The removal protocol we implement runs as follows: 1. The Planner asks the Deployer to stop the execution of a number of injector VMs. 2. The Deployer asks the ClifApp to unregister the injectors from the ClifApp injector list. The ClifApp forwards this reconfiguration to its inner components (supervisor etc.) and asks the corresponding injectors to stop injecting loads. 3. Once the injector has been unregistered from the ClifApp, the Deployer is notified by the ClifApp that the corresponding VMs are no longer considered. 4. Finally, the Deployer asks the IaaS to turn off the VM hosting the injector. After calibration and planning, the test can be continued in two ways. If the SUT needs to be stabilized before becoming usable, then the calibration phase is considered as the stabilizer phase. Thus, the real test will continue immediately after calibration. Otherwise, the SUT is restarted before the test is launched. In both cases, the Planner adapts the initial number of injectors according to the load profile and the injector VM’s capacity. The next section deals with how this protocol was evaluated. 5. EVALUATION This section presents the experiments we performed to evaluate our BaaSP. The purpose of this evaluation was to confirm that the cost of benchmarking applications in the cloud was reduced when using our platform. We also show the advantages of provisioning injectors just ahead of time. After describing the evaluation context (software and hardware) and workload scenarios used, the end of this section presents the results of the evaluation. 5.1. Evaluation context 5.1.1. The system under test. The SUT is provided by two types of applications: web and JMS messaging services. Web messaging services are run on RUBiS [8] (application version 1.4.3), a Java Enterprise Edition (JEE) benchmark based on servlets. RUBiS implements an auction web site modeled on eBay. It defines interactions such as registering new users and browsing, buying, or selling items. For the evaluation described here, we submitted only browsing requests. We deploy the RUBiS open source middleware solution composed of one Apache (2.2.14) web server (with Mod_JK 2 to connect to the application server), a Jakarta Tomcat (6.0.20) servlets container (with AJP 13 as the connector), and a MySQL server (5.1.36) to host auction items (about 48,000 items). For the JMS messaging service, the Joram [9] implementation is used. Joram incorporates a 100% JAVA implementation of JMS 1.1. It provides access to a truly distributed message-oriented middleware where messages are handled through particular data structures called destinations: queue and topic. Benchmarking of these two applications shows that BaaSP can handle various types of applications. 5.1.2. Cloud environment. Our experiments were carried out using the Grid’5000 [16]‡ experimental testbed (the French national grid). As shown in Table III, Grid’5000 is a distributed infrastructure, which is spread over the entire area of France. It is organized in ‘sites’ (where a site represents a city), which in turn are organized in clusters. For our experiments, we configured two Grid’5000s clusters (Chicon at Lille, in the north of France, and Pastel at Toulouse, in the south of France) to provide the injector cloud and the SUT cloud, respectively (as shown in the BaaSP architecture in Figure 3). The two clusters run O PEN S TACK [17] through StackOps [18] (version 0.4-b1262d20120223) to provide a virtualized cloud. The virtualization system is KVM version 2.0.0. Each RUBiS and Joram VM is started with 1 GB of memory, while injectors and the other BaaSP VMs used 256 MB. Each VM is pinned to one processor. All VMs run the same operating system as the ‡
Grid’5000 is an initiative of the French Ministry of Research through the ACI GRID incentive action, INRIA, CNRS, RENATER and other contributing partners
Copyright © 2013 John Wiley & Sons, Ltd.
Softw. Pract. Exper. 2015; 45:613–632 DOI: 10.1002/spe
SELF-SCALABLE LOAD INJECTION SERVICE
623
Table III. Configuration of the cloud platform evaluated (grid5000).
nodes that host them, that is, Linux Ubuntu 10.0.4 distribution with a 2.6.30 kernel, over a gigabit connection. Table III summarizes the configuration and the organization of these two emulated clouds. We used this environment to calibrate the deployment time, TTSVM. This time displays asymptotic behavior. For example, the deployment time of between 1 and 10 VMs is the same (100 s), whereas it grows from 11 to 20 VMs (with a difference of 75 s). For readability, in this section, we use TTSVM instead of TTSVM[i] for 1 < i < 10.
5.1.3. Server configuration. It should be remembered that the experimental environment contains the following systems: Cloud Middleware, RUBiS (the SUT), CLIF, and BaaSP. Each of them was configured as follows: Each RUBiS and Joram server is deployed on a single VM. Deployment and configuration
are performed using a BaaSP Deployer component, initially instantiated on the Pastel cluster. The bottleneck for the RUBiS configuration used is the database tier. Thus, RUBiS behavior is observable in this evaluation through its MySQL server node, which is CPU bound. This is also the case for the Joram server. The BaaSP core is deployed on a VM on the Chicon cluster. The Calibrator, the Planner, and the ElasticClif are deployed on a single VM. Each CLIF injector is deployed automatically when requested on a separate VM. Copyright © 2013 John Wiley & Sons, Ltd.
Softw. Pract. Exper. 2015; 45:613–632 DOI: 10.1002/spe
624
A. TCHANA ET AL.
5.2. Workload scenarios Three workload scenarios have been tested. These three workload scenarios summarize three situations corresponding to the following: an unfavorable case, a better case, and a more complicated unfavorable case for our BaaSP system. The first two are used to benchmark the RUBiS application, while the final one benchmarks the Joram server. Theoretically, each workload is designed to run in 1200 s. The first workload scenario (Figure 4(a)) represents a ‘simple’ test workload Ws (t). This workload is phases: a ramp-up phase (Ws (nt) D nWs (t)) followed by a ramp-down phase composed of two Ws .t / Ws (nt) D n . Together, these form a pyramidal workload. This kind of workload scenario requires the addition/removal of a single injector. The second and the third workload scenarios (Figure 4(b) and (c)) are more complex. They are composed of several different phases: a gentle increase in load, a constant load, a steep ramp-up of load, a steep ramp-down of load, and a gentle ramp-down of load. The second workload shows the benefit of the BaaSP. Unlike the first workload scenario, the second workload scenario sometimes requires the addition/removal of more than one injector at a time. The final workload shows how BaaSP deals with a more complex workload. The ultimate goal of our BaaSP is to minimize the cost of benchmarking, an application in a cloud environment. Naturally, the main metric used in this evaluation was therefore the cost of the test. We compare the cost of the test in two situations: deployment of static injectors (called P olicy0 ) versus provisioning of self-scaling injectors through our BaaSP. This second case was evaluated according to the following policies: P olicy1 : injectors are dynamically added/removed without ‘just ahead of time’ provisioning. P olicy2 : injectors are dynamically added/removed using a ‘just ahead of time’ provisioning
strategy. The cost of running the test in the cloud depends on both the duration of the test and on the number of VMs used during the test. In these three situations, the duration of the test includes: starting
Figure 4. Experimental workloads. Copyright © 2013 John Wiley & Sons, Ltd.
Softw. Pract. Exper. 2015; 45:613–632 DOI: 10.1002/spe
SELF-SCALABLE LOAD INJECTION SERVICE
625
the SUT and BaaSP, and the time to perform the tests themselves. Calibration time and injector addition/removal time are included when evaluating P olicy1 and P olicy2 . 5.3. Evaluation results The evaluation results we describe in this section are classed in two categories: quantitative and qualitative. Quantitative evaluation describes the actual results obtained for the behavior of the SUT and cloud test bed environment during the test. In addition, we analyzed how each provisioning policy behaved during the test. Qualitative evaluation formalizes the observations from the quantitative evaluation. Before describing these results, we will first present the variables on which the qualitative evaluation is based: Let C ostt u be the cost of running a VM (as described previously) in the cloud, for a time unit, TU. Let TTS be the time necessary to start the VMs running the SUT and the BaaSP. Let TTT be the theoretical time to run the benchmark workload (20 min in our case). Let nbVMRubis be the number of VM used to run RUBiS. Let nbVMBaaSP be the number of VMs used to run BaaSP components. Let nbInj be the maximum number of injectors used during the test (five in our case). The cost of running the test in the cloud without dynamic provisioning (P olicy0 ), noted C ost0 , is therefore given by the following formula: C ost0 D ŒnbI nj T T T C .nbVMRubis C nbVMBaaSP / .T T T C T T S / C ostt u 5.3.1. Workload scenario 1. We compare the cost of P olicy0 to the two other policies: P olicy1 (injectors are added without a ‘just ahead of time’ approach) and P olicy2 (injectors are added with a ‘just ahead of time’ approach). Figure 5(1) presents the actual results of the execution of this first workload scenario. It shows two types of curves: (a) the variation in CPU load for the RUBiS database tier (remember that this is the bottleneck of our RUBiS configuration) and (b) the injector provisioning rate during the test. These curves can be interpreted as follows: The behavior of the SUT follows the specified workload (pyramidal). This workload is seen
to saturate the MySQL node in terms of CPU consumption (100%) in the middle of the load profile. The execution of the test with P olicy1 behaves incorrectly, while P olicy0 and P olicy2 display the same behavior, which corresponds to the expected behavior (based on the workload scenario). In fact, P olicy1 extends the test duration beyond the theoretical duration specified in the workload scenario: about 400 s, corresponding to range (c) shown in Figure 5(1). We observe a series of steps during the upward phase with P olicy1 , in contrast with P olicy2 . This is because the deployment time for a new injector is not anticipated by P olicy1 while it is with P olicy2 . We do not observe the same phenomenon in the descending phase because injector removal is immediate. As shown in curve (b) in Figure 5(1), up to five injector VMs are used for each workload scenario. Remember that TTSVM is the time required by the BaaSP to add an injector (about 100 s in our experiment). Let TTCal be the time used by the Calibrator to calibrate an injector (60 s in our experiment) and InjMaxCapacity be the maximum capacity of an injector (40 vUsers in our experiment), with P olicy1 , as shown, the test runs for longer than the theoretical time. This time corresponds to the sum of the deployment time of all injectors added by the Planner during the test, (nbInj-1)*TTSVM. Let TTSI be the time needed to saturate an injector during the ramp-up phase (120 s in our experiment). With P olicy1 , TTSICTTSVM is the provisioning frequency during the ramp-up phase, whereas it is TTSVM in the ramp-down phase. With P olicy2 , in contrast, TTSVM is the provisioning frequency in both the ramp-up and ramp-down phases. Figure 5(2) shows the execution time for each injector using P olicy0 , P olicy1 , and P olicy2 . The formulas used to evaluate the cost of these different cases are presented here. Let ExecT imei be the execution time of injector i and ExecT imeRubi sBaaSP be the execution time for RUBiS and BaaSP VMs. Copyright © 2013 John Wiley & Sons, Ltd.
Softw. Pract. Exper. 2015; 45:613–632 DOI: 10.1002/spe
626
A. TCHANA ET AL.
Figure 5. (1) Injector provisioning with Benchmark-as-a-Service platform, performed without ‘just ahead of time’ provisioning (P olicy1 ) and with (P olicy2 ); (2) execution time and cost of the tests in different situations.
- ExecT imeRubi sBaaSP D .nbVMRubis C nbVMBaaSP / .ExecT ime1 C T T S /, the execution time for the SUT on the cloud. P olicy1 : - Let f .i/ D T T T C.nbI nj i/T T S VM .i 1/2T T SI be an intermediate function. - ExecT ime1 D T T C al C f .1/, the execution time of the first injector includes the calibration time. - ExecT imei D f .i/, 2 6 i 6 nbI nj 1/ - C ost1 D C ost0 C nbI nj.nbj .T T S VM 2T T SI / C T T C alC 2 .nbVMRubis C nbVMBaaSP /ŒT T C al C .nbj 1/T TM S V M C ostt u (E1) P olicy2 : - Let g.i/ D T T T .i 1/ 2 T T SI C T T S VM be an intermediate function. - ExecT ime1 D T T C al C g.1/. The execution time of the first injector includes the calibration time. Copyright © 2013 John Wiley & Sons, Ltd.
Softw. Pract. Exper. 2015; 45:613–632 DOI: 10.1002/spe
627
SELF-SCALABLE LOAD INJECTION SERVICE
- ExecT imei D g.i/, 2 6 i 6 nbI nj - C ost2 D C ost0 C Œ.nbVMRubis C nbVMBaaSP C 1/T T C al C nbI nj T T S VM 2 nbI nj.nbj 1/T T SI C ostt u (E2) Equations (E1) and (E2) indicate that dynamic provisioning is less expensive than static execution when .nbVMRubi sCnbVMBaaSP C1/T T C al C nbI nj.nbI nj 1/ .nbI nj=2CnbVMRubi sCnbVMBaaSP /.nbI nj 1/T T SVM (C1) nbI nj.nbI nj 1/ .nbVMRubi sCnbVMBaaSP C1/T T C al T T SVM P olicy2 : T T SI > C 2.nbI 2nbI nj.nbI nj 1/ nj 1/
P olicy1 : T T SI >
(C2)
When conditions (C1) and (C2) are respected, C ost0 6 C ost1 6 C ost2 . In our experimental environment, we have nbInjD 5, nbVMRubisD 3, nbVMBaaSPD 2, TTSD 250s, TTCalD 60s, TTSVMD 100s, and TTSID 120s. In this context, (C1) is not met, whereas (C2) is. Then, C ost0 D 13250 C ostt u , C ost1 D 14210 C ostt u , and C ost2 D 9310 C ostt u .
Figure 6. (1) Injector provisioning in the Bench-as-a-Service platform (‘just ahead of time’ provisioning, P olicy3 ); (2) execution time and cost of the tests in different situations. Copyright © 2013 John Wiley & Sons, Ltd.
Softw. Pract. Exper. 2015; 45:613–632 DOI: 10.1002/spe
628
A. TCHANA ET AL.
5.3.2. Workload scenario 2. Figure 6 shows the results for the second workload scenario. These results can be interpreted in a similar fashion to the previous workload scenario. Looking at the workload scenario, curve (b) of Figure 6(1) shows the injector provisioning: Three injectors are simultaneously added at time T1 when using ‘just ahead of time’ provision-
ing. This is due to the sharp ramp-up phase occurring from time 550 to 600 s, which is less than TTSVM (refer to the planning Algorithm 1). In contrast, injectors are added one at time (solid line) when ‘just ahead of time’ provisioning is not used. As for the first workload scenario, this configuration induces incorrect test behavior because it increases execution time. Unlike for the first workload scenario, we only formalize and evaluate the BaaSP with ‘just ahead of time’ provisioning in this scenario. In fact, the formalization of the other configuration is similar to that given for the first workload scenario. Figure 6(2) shows the execution time for this experiment compared with that with static execution. By using the same variables as in the previous section, the cost of this experiment (C ost3 ) is evaluated as follows: - ExecT ime1 D T T C al C g.1/. ExecT imei and g(i) are defined in the first workload scenario. - ExecT ime2 D g.2/ - ExecT imei D T T S VM C 100s, 3 6 i 6 5. These injectors only run during the peak in the load profile. h i P nj - C ost3 D .nbVMRubisCnbVMBaaSP /.T T T CT T S CT T C al/C nbI i D1 ExecT imei C ostt u - C ost3 D 10570 C ostt u The value of C ost3 compared with C ost0 (which is constant) shows that for some workload scenarios (e.g., long test campaigns), using dynamic injector provisioning becomes more advantageous. 5.3.3. Workload scenario 3. We benched the Joram server with up to 40,000 vUsers. A vUser behaves as follows: for each time unit, it sends and consumes a message of 100 octets. Thus, the maximum load generated by workload scenario 3 is 40,000 messages per second. Calibration of an injector VM results in a maximum capacity of 8000 vUsers. This implies that BaaSP requires up to five injector VMs to perform this third workload scenario. Figure 7 shows injector provisioning and the Joram server’s behavior (in terms of CPU load). The legend of this figure is equivalent to the first
Figure 7. Injector provisioning for the third workload scenario. Copyright © 2013 John Wiley & Sons, Ltd.
Softw. Pract. Exper. 2015; 45:613–632 DOI: 10.1002/spe
SELF-SCALABLE LOAD INJECTION SERVICE
629
two workload scenarios. As discussed in the two previous sections, provisioning injectors ahead of time (P olicy3 ) allows BaaSP to precisely follow the shape of the designated workload, this is not the case when injectors are not provisioned ahead of time (P olicy2 ). Despite the complexity of the workload, BaaSP obviously plans and anticipates the addition/removal of injector nodes to prevent sudden changes. Thus, when considering P olicy3 The five injectors are all started initially to deal with the initial abrupt ramp-up phase. The same
behavior is observed as before the abrupt ramp-up phase at time 1000 s. No injector is removed between time 200 and 300 s even though the workload decreases.
Indeed, BaaSP anticipates the abrupt ramp-up phase that follows the decreasing phase, which occurs within 100 s (the deployment time of a VM on the cloud). 6. RELATED WORK 6.1. Scalability Several research works has investigated the advantages of VM scalability or cloud elasticity in order to optimize physical machine utilization in the cloud. Jamal et al. [19] study scalability of VMs on multi-core architectures. The conclusion of [19] allows to choose which type of workloads can be collocated (via VMs) on the same host while minimizing application performance degradation. Jayasinghe et al. [20] study the vertical and the horizontal scalability of n-tiers applications on three widely used cloud platforms in order to compare them. In the same vein, SmartScale [21] is an automated scaling framework that uses combination of vertical and horizontal scaling operations. SmartScale mainly deals with how to combine horizontal and vertical operations to minimize operating costs. This approach is very complex, but SmartScale uses a simplification aiming at an avoiding combinational explosion for solution computing. Vaquero et al. [22] highlight pending challenges to make elasticity possible and presents an ideal scalable cloud system. If no framework is proposed, this paper is interesting by the fact that it presents a sum of existing solutions aiming at ‘elastify’ clouds. Marshall et al. [23] focus on web site elasticity. The solution is made of a resource manager on top of the cloud. It dynamically and securely extends resources for a web site according to the demand itself monitored by the solution. 6.2. Benchmarking A variety of benchmarking tools exist, ranging from simple tools (such as [24]) to complex ones (e.g., [14]). These tools can be classed into two main categories: specialized and adaptable tools. Specialized tools are dedicated to a specific SUT (hardware or software), whereas adaptable tools can be used for different SUTs. In the first category, Whetstone’s [25] benchmark appears as the first and most popular early tool. It aims to provide benchmark programs measuring the performance of computers. In this context, its program committee defined and made popular a set of industry standards to measure a computer’s performance (based on the number of instructions executed per second). In the same category appears the Transaction Processing Performance Council benchmark tool suite [26], which offers different specific benchmarking programs to benchmark transactions operations in complex systems. The Standard Performance Evaluation Corporation [27] and the DBench project [28] share the philosophy of Transaction Processing Performance Council. However, Standard Performance Evaluation Corporation does not focus on any single domain and proposes many benchmarking programs for a large range of applications (web site, database, virtualization, etc.). Many studies in the cloud area have demonstrated the benefit of dynamic resource allocation [29–31]. However, few studies addressed adaptive benchmarking tools. Nevertheless, we found some articles dealing with this topic. Unibench [32] is an automated benchmarking tool. Like our BaaSP, Unibench can remotely deploy both the SUT and the benchmarking components in a cluster. It is adaptive in its identification of changes to the SUT; after which, it performs another benchmarking process to evaluate this modification. To do this, Unibench must have access to the source code and the programming language of the SUT. Unlike with our BaaSP, the SUT is not considered Copyright © 2013 John Wiley & Sons, Ltd.
Softw. Pract. Exper. 2015; 45:613–632 DOI: 10.1002/spe
630
A. TCHANA ET AL.
as a black box. Almeida and Vieira [33] present research challenges for implementing benchmarking tools for self-adaptive systems. Apart from the definition of metrics and some principles to be applied when defining workloads, it does not deal with self-adaptation of the benchmarking tool itself. Manley et al. [34] present hbench:Web, a scalable load injector framework for web applications, based on real traces. The scability considered by hbench:Web is its ability to adapt to future predicted loads and change the composition of a workload, without damaging the realism of the workload. 6.3. Benchmarking on the cloud CloudGauge [35] is an open source framework similar to ours. It uses the cloud environment as the benchmarking context. Unlike our BaaSP, which evaluates a SUT running in the cloud, the SUT for CloudGauge is the cloud itself and its capacity for VM consolidation. It dynamically injects workloads to the cloud VM and adds/removes/migrates VMs according to the fluctuating workload. Like Selfbench [12] (the calibration system we used), CloudGauge can adjust the workload during the benchmarking process. Indeed, like our BaaSP, users can define a series of workloads for benchmarking. Because the SUT is the cloud, injectors are deployed inside VMs. There is no separation between injector nodes and SUT nodes. Thus, unlike our BaaSP, there is no need to dynamically create injector nodes. The architecture of CloudGauge shares some similarities with ours. For example, CloudGauge defines an orchestrator called Test Provisioning, which is responsible for orchestrating the benchmarking process. Other tools, such as VSCBenchmark [36] and VMark [37] are comparable with CloudGauge. They allow dynamic workloads to be defined for consolidated VM benchmarking in a cloud environment. As far as we know, no open-source benchmarking framework with comparable characteristic to ours exists. However, some proprietary and commercial tools share a similar approach. on the basis of the marketing information for BlazeMeter [38], this tool provides the same features as ours (except SUT deployment): dynamic injector allocation and de-allocation in the cloud to reduce test costs. BlazeMeter is an evolution of the JMeter [24] tool for cloud platforms. Because it is proprietary, no technical and scientific description is available. Therefore, it is difficult to compare its functionality with that of our platform. NeoLoad [39] is another tool similar to BlazeMeter. It allows deployment of injectors in a cloud environment for application benchmarking. It can integrate new injectors throughout the benchmarking process. However, this integration must be initiated by the administrator during planning. NeoLoad does not implement an automated planning component itself, as our BaaSP did. Expertus [40] automates the benchmarking process but does not implement dynamic injector provisioning or automated load generation features. One of the advantages of Expertus is that it generates code to automate the execution of a set of tests. 7. CONCLUSION This paper explores cloud computing facilities to ease application benchmarking and to test their scalability. Load testing solutions can be provided on demand in the cloud and can benefit from self-scalability. We describe a BaaS solution that provides a number of benefits in terms of cost and resource savings. The cost of hardware, software, and tools is charged on a pay-per-use basis, and platform setup is also greatly simplified. The self-scalability property of the platform eases the benchmarking process and reduces its cost for lengthy campaigns because it does not require static provision of the platform, which can be prohibitive in terms of human and hardware resources. Resource provisioning is minimized while ensuring load injection according to a given profile. Our experiments based on the RUBiS benchmark show the benefits of our system in terms of cost reduction for lengthy testing campaigns. As far as we know, our BaaSP is the only one that automatically scales the resources used for load injection. Besides the re-engineering of a load injection tool to enable self-scalability, the main concerns were: (i) online injector calibration; (ii) computation from the load profile and characterization of the right number of injector VMs; and (iii) the control of their provisioning sufficiently ahead of time. Copyright © 2013 John Wiley & Sons, Ltd.
Softw. Pract. Exper. 2015; 45:613–632 DOI: 10.1002/spe
SELF-SCALABLE LOAD INJECTION SERVICE
631
We tested our solution with three workload scenarios with the benchmarking tool and the SUTs (web and JMS messaging services) deployed in different geographical zones. For each workload scenario, we evaluated a first simple dynamic provisioning policy that showed incorrect behavior in terms of test duration. We then evaluated a second policy using ‘just in time’ VM provisioning. This second policy showed correct behavior equivalent to a static provisioning policy but reducing VM up-time throughout the campaign. In the future, we plan to add auto-scalability to the SUT side of our tool and to enhance our BaaSP to report resource provisioning of the self-scalable SUT itself. This will allow us to determine under which load conditions the SUT scales up or down. We also plan to add a mode to our platform. With this mode, load profiles will no longer be required. It aims at automatically provisioning and controlling load injection resources until the SUT is saturated. The difficulty here is to progressively stress an application at close to its limits while preventing thrashing. This requires fine-grained load injection control and provisioning. ACKNOWLEDGEMENTS
This work is supported by the French Fonds National pour la Societe Numerique (FSN) and Poles Minalogic, Systematic and SCS, through the FSN Open Cloudware project. REFERENCES 1. Zhang J, Cheung SC. Automated test case generation for the stress testing of multimedia systems. Software-Practice & Experience 2002; 32(15):1411–1435. 2. Bayan MS, Cangussu JW. Automatic stress and load testing for embedded systems. In Proceeding of the Annual International Computer Software and Applications Conference, Chicago, Illinois, USA, 2006; 229–233. 3. Mi N, Casale G, Cherkasova L, Smirni E. Injecting realistic burstiness to a traditional client-server benchmark. In Proceedings of the International Conference on Autonomic Computing, Barcelona, Spain, 2009; 149–158. 4. Barna C, Litoiu M, Ghanbari H. Autonomic load-testing framework. In Proceedings of the International Conference on Autonomic Computing, Karlsruhe, Germany, 2011; 910–100. 5. Blake R, Breese JS. Automatic Bottleneck Detection. Technical Report from Microsoft Research, number MSR-TR95-10, 1995. 6. Chung I-H, Cong G, Klepacki D, Sbaraglia S, Seelam S, Wen HF. A framework for automated performance bottleneck detection. In Proceedings of the IEEE International Symposium on Parallel and Distributed Processing, Miami, Florida USA, 2008; 1–7. 7. Koehler S, Stitt G, George AD. Platform-aware bottleneck detection for reconfigurable computing applications. ACM Transactions on Reconfigurable Technology and Systems 2011; 4(3):1–30. 8. Amza C, Ch A, Cox AL, Elnikety S, Gil R, Rajamani K, Cecchet E, Marguerite J. Specification and implementation of dynamic web site benchmarks. In Proceedings of the In 5th IEEE Workshop on Workload Characterization, Austin, TX, USA, 2002; 3–13. 9. JORAM: Java (TM) Open Reliable Asynchronous Messaging. Available at: http://joram.ow2.org/ [last accessed November 2013]. 10. Jain RK. The art of computer systems performance analysis: Techniques for experimental design, measurement, simulation, and modelling. ACM SIGMETRICS Performance Evaluation Review 1991; 19(2):5–11. 11. Kant K, Tewari V, Iyer RK. Geist: A web traffic generation tool. In Proceedings of the International Conference on Modelling Techniques and Tools for Computer Performance Evaluation, London, UK, 2002; 227–232. 12. Harbaoui A, Salmi N, Dillenseger B, Vincent J-M. Introducing queuing network-based performance awareness in autonomic systems. In Proceedings of the Sixth International Conference on Autonomic and Autonomous Systems, Cancun, Mexico, 2010; 7–12. 13. The CLIF Project. Available at: http://clif.ow2.org [last accessed November 2013]. 14. Dillenseger B. CLIF, a framework based on fractal for flexible, distributed load testing. In Annals of Telecommunications, Vol. 64. Springer: Paris. Numbers 1-2, Issue 1, 2009; 101–120. 15. Bruneton E, Coupaye T, Leclercq M, Quema V, Stefani JB. An open component model and its support in java. In Proceedings of the International ACM SIGSOFT Symposium on Component-Based Software Engineering, Edinburgh, UK, 2004; 7–22. 16. Bolze R, Cappello F, Caron E, Daydé M, Desprez F, Jeannot E, Jégou Y, Lanteri S, Leduc J, Melab N, Mornet G, Namyst R, Primet P, Quetier B, Richard O, Talbi EG, Touche I. Grid’5000: a large scale and highly reconfigurable experimental grid testbed. International Journal of High Performance Computing Applications 2006; 20(4):481–494. 17. Openstack Web Site. Available at: http://openstack.org/ [last accessed November 2013]. 18. StackOps Web Site. Available at: http://www.stackops.com [last accessed November 2013]. 19. Jamal MH, Qadeer A, Mahmood W, Waheed A, Ding JJ. Virtual machine scalability on multi-core processors based servers for cloud computing workloads. In Proceedings of the IEEE International Conference on Networking, Architecture, and Storage, Zhang Jia Jie, Hunan, China, 2009; 90–97. Copyright © 2013 John Wiley & Sons, Ltd.
Softw. Pract. Exper. 2015; 45:613–632 DOI: 10.1002/spe
632
A. TCHANA ET AL.
20. Jayasinghe D, Malkowski S, Wang Q, Li J, Xiong P, Pu C. Variations in performance and scalability when migrating n-tier applications to different clouds. In Proceedings of the IEEE International Conference on Cloud Computing, Washington, DC, USA, 2011; 73–80. 21. Dutta S, Gera S, Verma A, Viswanathan B. SmartScale: automatic application scaling in enterprise clouds. In Proceedings of the IEEE International Conference on Cloud Computing, Honolulu, Hawaii, USA, 2012; 221–228. 22. Vaquero LM, Rodero L, Buyya R. Dynamically scaling applications in the cloud. ACM SIGCOMM Computer Communication Review 2011; 41(1):45–52. ISSN 0146-4833. 23. Marshall P, Keahey K, Freeman T. Elastic site: using clouds to elastically extend site resources. In Proceedings of the IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, Melbourne, Australia, USA, 2010; 43–52. 24. The Apache Software Foundation. Apache JMeter. Available at: http://jmeter.apache.org/ [last accessed November 2013]. 25. Curnow HJ, Wichmann BA. A synthetic benchmark. Computer Journal 1976; 19(1):43–49. 26. Transaction Processing Performance Council (TCP). Avalable at: http://www.tpc.org/ [last accessed November 2013]. 27. Standard Performance Evaluation Corporation (SPEC). Avalable at: http://www.spec.org/ [last accessed November 2013]. 28. Dependability Benchmarking Project (IST-2000-25425). Avalable at: http://webhost.laas.fr/TSF/DBench/ [last accessed November 2013]. 29. Tchana A, Temate S, Broto L, Hagimont D. Autonomic resource allocation in a J2EE cluster. 3rd International Conference on Utility and Cloud Computing, Chennai, India, 2010. 30. Amazon Web Services. Amazon EC2 Auto-Scaling Functions. Available at: http://aws.amazon.com/fr/autoscaling/ [last accessed November 2013]. 31. Righscale Web Site. Available at: http://www.rightscale.com [last accessed November 2013]. 32. Rolls D, Joslin C, Scholz SB. Unibench: a tool for automated and collaborative benchmarking. In Proceedings of the The 18th IEEE International Conference on Program Comprehension, Braga, Portugal, 2010; 50–51. 33. Almeida R, Vieira M. Benchmarking the resilience of self-adaptive software systems: perspectives and challenges. In Proceedings of the 6th International Symposium on Software Engineering for Adaptive and Self-Managing Systems, Waikiki, Honolulu, HI, USA, 2011; 190–195. 34. Manley S, Courage M, Seltzer M, A self-scaling and self-configuring benchmark for web servers. In Proceedings of SIGMETRICS, Madison, Wisconsin, USA, HI, USA, 1998; 270–271. 35. El-Refaey MA, Rizkaa MA. CloudGauge: a dynamic cloud and virtualization benchmarking suite. In Proceedings of the 2010 19th IEEE International Workshops on Enabling Technologies: Infrastructures for Collaborative Enterprises, Larissa, Greece, 2010; 66–75. 36. Jin H, Cao W, Yuan P, Xie X. VSCBenchmark: benchmark for dynamic server performance of virtualization technology. In Proceedings of the 1st International Forum on Next-Generation Multicore/Manycore Technologies, Cairo, Egypt, 2008; 1–8. 37. Makhija V, Herndon B, Smith P, Roderick L, Zamost E, Anderson J. VMmark: a scalable benchmark for virtualized systems. Technical Report VMware-TR-2006-002, Palo Alto, CA, USA, September 2006. Available at: http://www.vmware.com/pdf/vmmark_intro.pdf [last accessed November 2013]. 38. BlazeMeter. Dependability Benchmarking Project. Available at: http://blazemeter.com/ [last accessed November 2013]. 39. Neotys. NeoLoad: Load Test All Web and Mobile Applications. Available at: http://www.neotys.fr/ [last accessed November 2013]. 40. Jayasinghe D, Swint GS, Malkowski S, Li J, Park J, Pu C. Expertus: a generator approach to automate performance testing in IaaS clouds. In Proceedings of the IEEE International Conference on Cloud Computing, Honolulu, HI, USA, June, 2012; 115–122.
Copyright © 2013 John Wiley & Sons, Ltd.
Softw. Pract. Exper. 2015; 45:613–632 DOI: 10.1002/spe