A Middleware for Reflective Web Service Choreographies on the Cloud Thiago Furtado∗ , Emilio Francesquini† , Nelson Lago∗ , Fabio Kon∗ ∗ Department of Computer Science, University of São Paulo, Brazil
{tfurtado, lago, kon}@ime.usp.br
† Institute of Computing, University of Campinas, Brazil
[email protected] ABSTRACT
1.
Web service composition is a commonly used solution to build distributed systems on the cloud. Choreographies are one specific kind of service composition in which the responsibilities for the execution of the system are shared by its service components without a central point of coordination. Due to the distributed nature of these systems, a manual approach to resource usage monitoring and allocation to maintain the expected Quality of Service (QoS) is not only inefficient but also does not scale. In this paper, we present an open source choreography enactment middleware that is capable of automatically deploying and executing a composition. Additionally, it also monitors the composition execution to perform automatic resource provisioning and dynamic service reconfiguration based on pre-defined Service Level Agreement (SLA) constraints. To achieve that, it keeps a meta-level representation of the compositions, which contains their specifications, deployment statuses, and QoS attributes. Application developers can write specific rules that take into account these meta-data to reason about the performance of the composition and change its behavior. Our middleware was evaluated on Amazon EC2 and our results demonstrate that, with little effort from the choreography developer or deployer, the middleware is able to maintain the established SLA using both horizontal and vertical scaling when faced with varying levels of load. Additionally, it also reduces operational costs by using as little computational resources as possible.
A computational system is said to be scalable if it is able to perform its duties in an acceptable manner (as defined by the user) even in the event of wide variations of load. Cloud environments present these systems with the possibility of scaling up or down in a very straightforward manner. Therefore, a well designed middleware for these environments should be able to react to the infrastructure and system conditions. Reactions might include, for example, automatic usage of additional computing resources. A common approach to build a distributed system is to use a Service-Oriented Architecture (SOA) [10, 16]. This architecture has enjoyed widespread adoption mainly because it facilitates the interoperability and reuse of legacy components. The basic component of a SOA-based system is a service. Services are independent entities that provide specific functionalities to their clients. If the desired functionality requires the use or cooperation of several services, they can be organized into compositions. Compositions with a centralized control structure are known as orchestrations whereas compositions without a centralized control structure are known as choreographies [8, 1]. As each service provides different functionalities, they are bound to different usage patterns. Not only the intrinsic functionality of each service might influence the load, but also they are subject to fluctuations due to the time of the day or the day of the week. For instance, a shopping application might experience a high load during the evening but face very low demands during the morning. These variations in the load are reflected on the amount of CPU and I/O resources required to keep the service working within the acceptable QoS level. In this context, it is desirable that the middleware responsible for the execution of a choreography on the cloud also becomes responsible for the allocation and release of resources in accordance with the load. To maintain the expected QoS levels, resources allocated to each service must be tuned continuously, thus adapting to the variation of load. This adaptability requires a fundamental change in the way services are deployed and managed. A choreography that is capable of “reasoning” about its current state, with the help of the underlying middleware, and also to dynamically reconfigure its own internal components (services) and connections (service bindings) becomes, therefore, reflective. In this sense, reflective means that the system is aware of its own structure and is capable of dynamically adapting this structure according to runtime needs [9]. Our reflective choreography management middleware focuses on the deployment and QoS management of choreographies in order to serve as a Platform as a Service (PaaS) system for scalable choreographies. It represents each chore-
Categories and Subject Descriptors D.1.3 [Programming Techniques]: Distributed programming
General Terms Reliability, Performance
Keywords Middleware, QoS, Reflection, SOA
INTRODUCTION
ography as an abstract specification. This representation includes the component services and the relationship these services have between themselves as well as to other services. By keeping that information at the meta-data level, the choreography developer or deployer can use the middleware support to encode rules that reason about the choreography state to redesign its own deployment specifications to comply to the established SLAs. After these specifications are recalculated, the middleware can then automatically perform the necessary deployment actions. As a significant improvement to the previous version of our work [6], in this paper we present a middleware to support reflective choreographies that are capable of dynamic reasoning and adaptation to the changing environmental parameters to keep the QoS within the specified SLAs. To evaluate the effectiveness of our approach, we performed an experimental evaluation in which we analyze SLA violations when we apply vertical and horizontal scaling approaches to a choreography deployment. Our results show that the resulting system is able to automatically and dynamically maintain the QoS within acceptable levels even under large load variations. This is achieved by using additional resources when the load increases and releasing underused resources when the load decreases. The middleware greatly facilitates the work in creating this adaptive choreographies saving the programmer from the tedious task of writing hundreds of lines of code to achieve that. The remaining of this document is organized as follows. Section 2 briefly describes how reflection can help a middleware to maintain the QoS of service compositions. Section 3 shows our proposal for an event-based middleware with support for dynamically scaling and reconfiguration of choreographies. We show the experimental evaluation in Section 4, then we compare our proposal to related works in Section 5. Finally, we conclude in Section 6.
2.
BACKGROUND AND MOTIVATION
Choreographies are a structural approach to connect services in order to build a service composition. In this setting, services are entities responsible for a pre-defined role. Service dependencies are strictly based on those roles. Additionally, each service taking part in a choreography is expected to produce correct results and manage the communication with the services it depends on. A commonly accepted way to provide scalability is the use of stateless services, since each distinct deployment of the services can be used interchangeably. Conversely, stateful services tend to be more complex, since they need to keep explicit session control mechanisms and, thus, restrict the choreography composition and execution choices. The use of stateless services has therefore the advantage that it allows our middleware to instantiate as many replicas as needed and deploy them in different kinds of machines (e.g., with different processing power) to maintain adequate performance levels. In this work, we are interested in the former, i.e., stateless services. Service providers, consumers, and other stakeholders may negotiate QoS characteristics; the details of this negotiation and the relevant QoS metrics vary but we consider the result of this negotiation to be an SLA: basically, a set of QoS constraints. To that end, a QoS model that allows each client to define their expected QoS levels is required. Table 1 lists examples of representative non-domain specific QoS properties. Our system is flexible enough to handle various QoS parameters and SLAs according to the needs of the user by
Property Definition Response The amount of time elapsed between sendTime ing a request and receiving the response. Throughput The amount of requests a service can process in a given time unit, normally measured in requests/second. Availability Often measured in percentages, is defined uptime where uptime repreas uptime+downtime sents the time a service is running and answering requests and downtime, the time the service is not answering requests. Financial The monetary value owed to the infrasCost tructure provider for the use of resources. Table 1: QoS Properties means of user-defined rules. The management of a composition QoS demands, in addition to the QoS model, dynamic adaptation mechanisms. To that aim, in this work we employ Complex Event Processing (CEP) and reflection. CEP [5] is an event-based approach used to measure system properties through the analysis of events and their comparison to the related SLAs. Complex events can be also employed to trigger actions indicating which reconfiguration should be done to keep the expected QoS levels. A complex event can be defined as the correlation between a set of other events (simple or complex). A complex event could be generated, for instance, when a higher than normal operation response time event occurs concomitantly with a CPU and I/O usage at the server side. CEP is generally regarded as being a practical and efficient choice for monitoring and executing reactive tasks in distributed systems [5]. Additionally, the reconfiguration of choreographies can be further improved by the knowledge of the facts related to their deployment. In this text, an event is a representation of a functional characteristic of the composition as, for example, the response time. A fact is a representation of a structural state, for example, the IP address of a service instance or the list of cloud nodes on which some service is currently deployed. When the monitoring framework indicates that some correlation of events should trigger a reconfiguration, the middleware reflectively can reason using information contained in the meta-data layer to perform changes and dynamically adapt the deployment of the compositions on the cloud environment ensuring the expected QoS is met.
3.
MIDDLEWARE ARCHITECTURE
Our middleware architecture divides responsibilities in two main categories. The first, monitoring, uses CEP to perform evaluations of both QoS properties and resource usage. It also employs CEP to carry out event correlation analysis. The second category, reflection, is based on the results of those evaluations. If some action is needed to maintain QoS levels, the middleware reflectively performs an introspection analysis and reifies the necessary changes. On-the-fly reconfiguration of choreographies can be done using several approaches [3]. In this work, we employ a well known approach called Monitoring, Analysis, Planning, Execution (MAPE) loop [15] with Analysis using CEP. In this approach, the four phases are executed repeatedly for each information collected from resources and services. The first phase is responsible for the collection of information about the usage of the infrastructure and QoS properties. That information is then analyzed (by the Complex Event Pro-
Middleware implementation
Dynamic reconfiguration depends on monitoring both services themselves and their underlying infrastructure; that is the monitoring phase of the MAPE loop. Accordingly, the reflective layer of the middleware is tied to an event processing mechanism that consumes events received from every cloud node from two different sources. Since the deployment of services is done using Tomcat, we instrument each Tomcat instance with valves2 capable of getting QoS-related measurements for each received request. At the same time, resource usage is assessed through probes created with Ganglia [12], a lightweight resource monitoring tool. Whenever a measurement is considered to be anomalous, it is forwarded 1 Some virtualization technologies allow users to modify VM resources such as memory or number of processing units on the fly. However, currently, no major cloud provider offers this possibility. 2 Tomcat Valves – https://tomcat.apache.org/tomcat-7. 0-doc/config/valve.html.
Events Monitoring Update Notifications
Cloud Node N
Probe 1
Probe N
Resource Manager Aggregator Choreography Specification Storage Meta-level
Choreography Deployment Update (Reification)
3.1
Cloud Node 1
Choreography Specification Update (Reflection)
cessor) in search of potential SLA violations. Strategies to overcome QoS problems can then be devised using the insight brought by the analysis. The execution, the last step of the loop, is when the reconfiguration is applied. Our middleware architecture uses a MAPE approach with reflective and event-based monitoring and deployment. Figure 1 depicts its simplified architectural diagram. Our middleware architecture is implemented using three main modules running on dedicated nodes: Deployment Manager (DM), Resource Manager Aggregator (RMA), and Coreography Specification Storage (CSS). The DM module contains the mechanisms to control cloud deployments, such as the allocation and release of new nodes or the deployment of a service on a specific node. The RMA module is responsible for monitoring and analyzing events and reflectively determining choreography reconfigurations through updates sent to the CSS module. The CSS module keeps information about the choreographies and their deployment. When a choreography reconfiguration is requested to the CSS, it uses the DM to reify the needed changes on the environment according to the new deployment specifications. Every deployed choreography is constantly monitored for several QoS attributes. The measurements are done by monitoring probes present on every middleware managed cloud node. These probes generate periodic events which are sent to the RMA. The RMA then uses a rule engine to perform CEP on them. Events are analyzed and correlated to known system behaviors. A single event or a set of correlated events might indicate, for instance, that the performance of a service is problematic or that more resources than necessary were allocated. In any case, a rule engine is employed to determine which is the best reactive course of action. Once the analysis is completed and it is determined that a reconfiguration is needed, the RMA devises a reconfiguration plan and sends it to the CSS. The CSS calculates the necessary changes to the choreography specifications and notifies the DM so that the effective reconfiguration is executed or reified in reflection terminology. This calculation takes into account the current status of the deployment (using the meta-data kept by the CSS) to perform only the required changes, in an incremental manner. The reconfiguration might involve resource upscaling or downscaling, performed in a horizontal or vertical fashion, i.e., increasing/decreasing the number of nodes executing the service or migrating the service to a node with larger/smaller capacity1 .
Deployment Manager
Middleware Figure 1: Simplified architectural diagram. to the RMA. Several activities comprise the analysis and planning phases of the MAPE loop. The RMA component keeps relevant events in memory to perform correlation analysis among them. The status of the deployed choreographies is checked to decide if and what reconfiguration plan is needed. To perform CEP on the events, we use Glimpse [2]. Facts about the system are updated as choreography deployments and status change. Facts are events that remain in memory all the time and should be removed explicitly. Glimpse has an API to the Drools [11] rule engine that provides flexible SLA monitoring and CEP with temporal reasoning. In this setting, the Drools main role is to detect complex events, correlating virtual machine resources status to QoS measurements. Once the reconfiguration rules are laid out, the execution phase is triggered whenever detection of a pre-defined correlation of events occurs, producing a specific reaction. Some of the rule-dictated reactions might only affect the way the middleware performs event analysis, thus effectively reconfiguring itself. On the other hand, some rules will cause actions to be taken on the CSS which, consequently, will trigger the DM. In the first case, when a correlation of events is detected, if the evaluation rules (normally based on thresholds) indicate a possible problem with a service, a new complex event is generated and refed into the system. In the second case, in which complex events are processed, rules that determine changes to the CSS are fired up. If, to maintain the pre-established QoS, a reconfiguration is deemed necessary, the CSS calculates and notifies the DM which, in turn, performs the necessary steps to correct the problem. To minimize the impact of these actions, the DM evaluates the difference between the current and the reconfigured specification sent by the CSS and only applies the incremental changes needed to reach the new configuration. The entirety of these steps is a choreography update, and the reconfiguration mechanisms we use in this work were implemented atop CHOReOS Enactment Engine platform [4] developed by our research group.
4.
EXPERIMENTAL EVALUATION
The evaluation of the proposed middleware was done experimentally using a simple benchmark choreography. Both the benchmark and our middleware are open source and licensed under the Mozilla Public License V2. Their code is available at: https://github.com/choreos/enactment_engine/ releases/tag/v2014-09.
1 2 3 4 5 6 7 8 9 10 11
4.1
Experimental Setup
The benchmark choreography consists of two services: Service A, which exposes an operation that runs a CPU-intensive task (a recursive na¨ıve version of the Fibonacci algorithm) and Service B which performs several requests to Service A. The load brought on by these calls is evenly distributed among various replicas of Service A. Our evaluation is based on the observation of the middleware reactions to the variations on the services response time. This variation is induced by a test client application that exercises distinct access patterns, with different number of requests per minute. To simulate situations that would be likely faced on a real production environment, we used real access patterns from Wikipedia3 . For practical purposes, the original 24-hour trace was linearly mapped to two hours of simulated accesses. Due to the nature of the benchmark, our evaluations employ a simplifying assumption that only the response time is enough to evaluate the QoS of the composition. However, other factors such as CPU use might be included as part of the QoS evaluation. For an I/O-bound application, similar rules based on throughput levels instead of response times could also be used to make better scaling decisions. In our tests, the middleware was configured to avoid response times above one second. The actual rule, written in Drools’ Rule Language4 , is depicted in Figure 2. Lines 02-09 initialize variables with response time events, SLA information and the configured scalability policy. Next, Lines 11-14 count the number of response time events collected during the last 30 seconds. Then, Lines 15-19 evaluate if at least 95% of the requests observed the specified SLA. Lines 20-21 avoid concurrent reconfigurations and Lines 22-23 accounts for the delay between a previous reconfiguration and actual response time stabilization. Finally, if all the conditions previously evaluated were fulfilled, Lines 26-29 generate a high response time complex event. The generated event will be evaluated by another rule that will take appropriate action to ensure QoS levels are in accordance to the specified SLA. Evaluations of the horizontal and vertical scaling were performed independently. Depending on the chosen policy, the middleware creates and deploys additional service replicas on new nodes (horizontal) or migrates the service to more powerful nodes (vertical). Similarly, when the load on the system is reduced, service replicas are deactivated or migrated to smaller machines.
4.2
Experimental Platform
For our experiments, we employed two private servers in our university, in addition to Amazon EC2 instances. The DM and the RMA were executed on private servers, an Intel Core 2 Duo P8700 @ 2.53 GHz with 4 GB of RAM and an Intel Xeon E7-3870 @ 2.40 GHz with 16 GB of RAM, 3 Wikipedia’s Homepage - Portuguese Version access traces for August 27, 2014, available at: http://tools.wmflabs.org/ wikiviewstats/. 4 Drools’ Rule Language – http://docs.jboss.org/drools/
release/5.2.0.Final/drools-expert-docs/html/ch05.html
when $reponseTimeEvent : ResponseTimeEvent(); $serviceDeployStatus : DeployStatus( instance == $reponseTimeEvent.instance); ResponseTimeEvent(instance == $reponseTimeEvent.instance, this before[1m30s,2m] $reponseTimeEvent); $sla : SLA(metric == "max_response_time", service_or_ip == $serviceDeployStatus.service); $policy : Policy(service == $serviceDeployStatus.service);
12 13 14 15 16 17 18 19 20 21 22 23
Number($reponseTimeEventSum : doubleValue) from accumulate( $responseTimeCounter : ResponseTimeEvent( $reponseTimeEvent.instance == instance) over window:time(30s), count($responseTimeCounter)); Number(intValue > $reponseTimeEventSum * 0.05) from accumulate( $highResponseTimeCounter : ResponseTimeEvent(value > 1000, $reponseTimeEvent.instance == instance) over window:time(30s), count($highResponseTimeCounter)); not InProgressReconfiguration( service == $serviceDeployStatus.service); not InStabilizationInterval( service$reponseTimeEvent == $serviceDeployStatus.service);
24 25 26 27 28 29 30
then insert(new HighResponseTime( $serviceDeployStatus.getIp(), $serviceDeployStatus.getService(), $policy.getPolicy())); end
Figure 2: High response time detection rule. respectively. The service composition was executed on three different Amazon instance types: small (1 virtual core, Xeon E5-2650 @ 2.26 GHz, 1.7 GB of RAM), medium (1 virtual core, Xeon E5-2650 @ 2.26 GHz, 3.75 GB of RAM), and large (2 virtual cores, Xeon E5645 @ 2.0 GHz, 7.5GB of RAM). All machines were running GNU/Linux 3.2.0 and Open JDK 6. For the horizontal scaling experiment, we used VMs of small instance type whereas for the vertical scaling experiment, we employed small, medium, and large instances.
4.3
Experimental Results
Figure 3 and Figure 4 show the experimental results including monitoring and reconfiguration data. These figures separate the results into two graphs to facilitate the understanding of the relationship between response time and workload of the machines. The graph on the top shows, on the left axis, the response time experienced by Service B clients during execution. On the right axis, we show the number of requests per second made to the service. The graphs on the bottom part of the figures track the workload on the virtual machines. The blue boxes highlight the time intervals comprising the moment the system detects the need for a reconfiguration and the end (deployment/reification) of the reconfiguration task. As expected, the workload on the nodes that host Service A is tied to the response time: when the workload increases the response time increases as well. This behavior prompts a reaction from the RMA during the evaluation of the event correlation rules. This reaction triggers a reconfiguration intended to maintain the QoS level established by the pre-configured SLA. When we analyze the service response times, we realize that the adaptation mechanisms of the middleware are capable of maintaining the expected QoS for most of the time. There are some occasional spikes in the response time, which can be attributed to the delay between the increase of load and the reaction of the middleware, due to the use of a 30-second sampling window by the rule engine.
Avg. Response Time Requests per second
2.5
2
1.5 1000
1
Requests per second
10000
Average Response Time (ms)
Average Response Time (ms)
Requests per second
3
10000
2.5
2
1.5
1000
1
100 80 60 40 20 0
0
0
1000
2000
3000 4000 Time (s)
Replica 1
Replica 2
5000
6000
7000
0.5
CPU Usage (%)
CPU Usage (%)
0.5
Replica 3
Horizontal Scaling
The simulation begins with only a small instance. Since the number of requests is high, SLA violations are detected right at the beginning of the execution (first ∼100s). These violations trigger the reconfiguration mechanisms of the middleware. Since the middleware was configured to perform horizontal scaling, it decides to add a new replica. As the number of requests decreases from ∼600s to ∼2000s of execution time, the low CPU usage triggers a new reconfiguration releasing the recently created instance. Once again, as the load on the system increases, a new reconfiguration is performed at ∼4980s to increase the number of simultaneous instances to two. Right at the end of each reconfiguration, we can observe a brief increase in the average response time. This higher than normal response time is due to the hand-over procedure performed between the previous and new instances where the service is deployed. Additionally, at about 3100s of execution, there is a peak in response time that we could not explain. However, we suspect this was related to an overload on the shared physical machine hosting our VM. Removal of a service instance is simpler than addition and, therefore, this kind of reconfiguration is faster. On average, our experiments took ∼55s to perform reconfigurations that only removed an instance and ∼355s to carry out those that involved the creation of a new cloud instance.
4.3.2
Vertical Scaling
In contrast to the horizontal scaling approach, vertical scaling performs mainly service migrations across different VM instance types to maintain QoS; it is only when the usage of the most powerful instance is close to its maximum that the middleware decides to use more replicas of the services. Using the same access patterns of the previous simulation, we can compare their results. The first noticeable difference is that, in this approach, the middleware never had the need to use more than one replica. Amazon’s medium and large instances were enough to fulfill the response time requirements, even when the number of requests greatly increased after 3500 seconds of simulation. Similarly to the horizontal scaling experiment, in the beginning of the execution, the number of requests is too high to be fulfilled by only a small instance. SLA violations are
100 80 60 40 20 0
0
0
1000 Small
Figure 3: Horizontal Scaling
4.3.1
Requests per second
3 Avg. Response Time
2000 Medium
3000 4000 Time (s) Small
5000
Medium
6000
7000
Large
Figure 4: Vertical Scaling detected and a reconfiguration is performed, as expected, almost at the same time as the previous experiment (around 100s of execution). As the number of requests decreases from ∼600s to ∼2000s of execution time, the low level of CPU use triggers a new reconfiguration releasing the medium instance in favor of a small one. Once again, as the load on the system increases, a new reconfiguration is performed at ∼3100s to use a medium instance until ∼6540s, when a new increase from a medium to a large instance is performed. On average, each reconfiguration took ∼298s. Since vertical scaling always involves the creation of a new instance, differently from horizontal scaling, the duration of every reconfigurations is virtually the same. Throughout the experiments, for most of the time, the specified SLA was observed. The middleware only violated the SLA in 7% and 1.5% of the requests for the horizontal and vertical policies, respectively. These percentages do no include the violations that occurred during the first reconfiguration, at the beginning of the execution. If we consider these violations, the percentages rise to 22% and 3.7% respectively. This difference highlights the impact a poor initial deployment has on QoS. However, it also demonstrates our middleware’s capability to dynamically adapt to QoS degradation. Nonetheless, even when the SLA was being respected, downscaling reconfigurations intended to reduce the financial costs of operation were performed, for example, at around 1800s and 2000s of execution time for the horizontal and vertical policies respectively.
5.
RELATED WORKS
Research on choreography QoS maintenance has been gaining importance due to the ever increasing use of service compositions. Current research on choreography QoS maintenance revolves around two main approaches. The first approach employs service selection based on service degradation factors [14, 7, 17, 20]. In this case, when a service degradation is detected, a search is performed and an equivalent non-degraded service is chosen to fulfill the requests in lieu of the degraded one. The second approach is based on service reconfiguration taking into account resource utilization and the topology of the service [21, 18]. This approach uses both service migration as well as horizontal scaling to maintain an acceptable QoS.
Differently from related works [19], we use a reflective approach to service composition reconfiguration [9]. Most related works directly use monitoring analysis results to perform deployment reconfigurations. However, in our case, monitoring analysis is performed using CEP to update queryable representations of the deployment, statuses, and specifications of the system. Then, using a rule-based approach based on these meta-data, we delineate the most appropriate reconfiguration strategy in an incremental fashion.
6.
CONCLUSION
In this paper, we presented a middleware to support the easy enactment of reflective web service choreographies on the cloud. Our middleware maintains runtime information about the choreography specifications, deployments status, and QoS attributes. An event-based monitoring approach based on CEP analyzes the system performance and is able to reflectively reason and modify its own behavior. The rules governing this self adaptation are based on QoS properties, such as response time, and are defined by the application. Experimental results show that our system is able to deploy an experimental composition and adapt it to cope with load variations, limiting resource usage while allowing expected QoS to be violated for only brief amounts of time. The combination of reflection and CEP provides great flexibility in handling different situations. Indeed, reflection enables the middleware to gather and maintain knowledge about all relevant aspects of the choreography, feeding data to the CEP module. Concomitantly, CEP rules may be arbitrarily complex and deal with any number of detected events. Our middleware focuses on performance aspects, but the approach can be applied to other needs as well. A limitation of the current prototype is the manual creation of the reconfiguration rules based on the developer’s knowledge of the services. We plan to use the Scalability Explorer tool [13] to reflectively create rules for service compositions. The choice of a rule-based CEP engine will allow us to easily adopt a vast number of different reconfiguration strategies to our prototype.
Acknowledgments This work was funded by the European Commission’s FP7ICT-2009-5 under grant agreement number #257178 (project CHOReOS – Large Scale Choreographies for the Future Internet – www.choreos.eu), by FAPESP, CNPq, and CAPES.
7.
REFERENCES
[1] A. Barker et al . Choreographing web services. IEEE Trans. on Services Computing, 2(2):152–166, 2009. [2] A. Bertolino et al. Glimpse: a generic and flexible monitoring infrastructure. In Proceedings of the 13th European Workshop on Dependable Computing, pages 73–78. ACM, 2011. [3] D. Ardagna, G. Casale, M. Ciavotta, J. F. P´erez, and W. Wang. Quality-of-service in cloud computing: modeling techniques and their applications. Journal of Internet Services and Applications, 5(1):1–17, 2014. [4] B. Hamida et al. Monitoring service choreographies from multiple sources. In Proceedings of 4th Intl. Workshop on Software Engineering for Resilient Systems, volume 7527 of Lecture Notes in Computer Science, pages 134–149. Springer, 2012. [5] D.C. Luckham. The power of events, volume 204. Addison-Wesley Reading, 2002.
[6] T. Furtado, E. Francesquini, N. Lago, and F. Kon. Towards an enactment engine for dynamically reconfigurable and scalable choreographies. Proceedings of IEEE First Intl. Workshop on Service Orchestration and Choreography for the Future Internet (OrChor), June 2014. [7] G. Wang et al . Service Level Management using QoS Monitoring, Diagnostics, and Adaptation for Networked Enterprise Systems. In Ninth IEEE International EDOC Enterprise Computing Conference, pages 239–248. IEEE, 2005. [8] V. Issarny, N. Georgantas, S. Hachem, A. Zarras, P. Vassiliadist, M. Autili, M. Gerosa, and A. Hamida. Service-oriented middleware for the future internet: state of the art and research directions. Journal of Internet Services and Applications, 2(1):23–45, 2011. [9] F. Kon, F. Costa, G. Blair, and R. H. Campbell. The case for reflective middleware. Communications of the ACM, 45(6):33–38, 2002. [10] L. Baresi et al. Toward open-world software: Issue and challenges. Computer, 39(10):36–43, 2006. [11] M. Bali. Drools JBoss Rules 5. X Developer’s Guide. Packt Publishing Ltd, 2013. [12] M. L. Massie and C. et al. The Ganglia distributed monitoring system: design, implementation, and experience. Parallel Computing, 30(7):817–840, 2004. [13] P. Moura and F. Kon. Automated scalability testing of software as a service. In 8th Intl. Workshop on Automation of Software Test (AST), pages 62–68. IEEE, 2013. [14] P. Vienne and J-L. Sourrouille. A middleware for autonomic QoS management based on learning. In Proceedings of the 5th international workshop on Software engineering and middleware, SEM ’05, pages 1–8, New York, NY, USA, 2005. ACM. [15] P. Vromant et al. On interacting control loops in self-adaptive systems. In Proceedings of the 6th Intl. Symposium on Software Engineering for Adaptive and Self-Managing Systems, pages 202–207. ACM, 2011. [16] Z. Qiu, X. Zhao, C. Cai, and H. Yang. Towards the theoretical foundation of choreography. In Proceedings of the 16th International Conference on World Wide Web, pages 973–982. ACM, 2007. [17] R. B. Halima et al. Non-intrusive QoS monitoring and analysis for self-healing web services. Proceedings of the 1st International Conf. on the Applications of Digital Information and Web Technologies (ICADIWT), (1):549–554, 2008. [18] R. Calinescu et al . Dynamic QoS management and optimization in service-based systems. IEEE Trans. on Software Engineering, 37(3):387–409, 2011. [19] O. Saleh and G. et al. Monitoring and autoscaling iaas clouds: A case for complex event processing on data streams. In Proceedings of the IEEE/ACM UCC ’13, pages 387–392. IEEE Computer Society, 2013. [20] S.S. Yau et al. Developing Service-Based Software Systems with QoS Monitoring and Adaptation. 2008 12th IEEE International Workshop on Future Trends of Distributed Computing Systems, pages 74–80, 2008. [21] W. Li. QoS Assurance for Dynamic Reconfiguration of Component-Based Software Systems. IEEE Trans. on Software Engineering, 38(3):658–676, 2012.