An Architectural Approach for Event-based Execution Management in Service Oriented Infrastructures Spyridon V. Gogouvitis, Kleopatra Konstanteli, Dimosthenis Kyriazis and Theodora Varvarigou School of Electrical and Computer Engineering National Technical University of Athens Athens, Greece Email: {spyrosg, kkonst, dimos}@mail.ntua.gr;
[email protected]
Abstract—In order to enable interactive applications to benefit from the offerings of the Cloud paradigm, factors such as QoS and event handling need to be taken into consideration in the development of new platforms. In this paper we propose a hierarchical Execution Management System Architecture that enables the execution of real-time interactive applications on SOIs by providing the required QoS levels. Keywords-Execution Management; Quality of Service; Cloud Computing; Service Oriented Infrastructures; Interactive Realtime applications;
I. I NTRODUCTION Service Oriented Architecture (SOA) [1] is an architectural style that emphasizes implementation of components as modular services that can be discovered and used by clients. Infrastructures based on the SOA paradigm are called Service Oriented Infrastructures (SOIs). Cloud Computing [2], one of the buzzwords of today, can be considered as an evolution of SOIs. The Cloud Service Model [3] covers all layers of IT, including infrastructure, platform and application, hence the terms Infrastructure-as-a-Service (IaaS), Platformas-a-Service (PaaS) and Software-as-a-Service (SaaS) [4]. The IaaS provider aims to offer the raw machines on a demand basis, possibly concealing the infrastructure through virtualization techniques. The PaaS provider provisions a development environment that allows for the adaptation of an application to a SOI, as well as acting as the mediator between the SaaS and the IaaS providers. The SaaS provider offers an application as a service over the Internet aiming to benefit from the opportunities this approach has to offer. Many interactive applications rich in multimedia content can be deployed in such infrastructures. These have strict timing requirements and can be classified as real time systems. A real time system is one in which its correctness is defined not only by its final result but also by the time that this is produced [5]. When any deviation from the timing constraints is detrimental for the system then it is considered as a hard real time system. In cases where some deviations from the timing requirements are acceptable, as long as these are few in occurrence and within some predefined boundaries then the system can be considered as soft real time.
In this context, largely inspired by the IRMOS project, we propose a hierarchical Execution Management System Architecture that enables the execution of real-time applications on SOIs by providing the required QoS levels. The system is also designed to allow for the specification of events from various sources and consumers as well as the definition of actions based on them. II. R ELATED W ORK Most service oriented execution management systems come from the Grid area [6]. In this section we present related work that focuses on the systems management space and specifically to approaches dealing with QoS provision. Taverna [7] is a WfMS that follows a centralized architecture which poses questions on the scalability of the system. Taverna supports web services but does not provide any QoS guarantees. However, the system provides monitoring of running workflows and a friendly environment for users to manipulate them. As far as fault tolerance is concerned it allows for the definition of either a retry operation or an alternate location of the same service. The Askalon project [8] is mainly focused in performance oriented applications. The project follows a decentralized architecture, but with a global decision making mechanism. Users are able to specify high-level constraints and properties centered on execution time and the workflow is scheduled based on performance prediction. Askalon provides monitoring of the executed workflow but does not allow for user interactivity. Check pointing and migration techniques are used for fault-tolerance. The Amadeus environment [9] follows a centralized approach. QoS parameters concerning time and cost are supported and performance prediction is carried out in order to find the optimal resource, while also provisioning SLAs for the agreement between the user and the provider of the service. It does not supply any form of fault tolerance or monitoring of the execution of the workflow. The GrADS project [10] is based on the Globus Toolkit (GT) [11] and it aims at applications with large computational and communication load. It supports the specification of workflows which are analyzed and the dependencies
between the tasks are identified. This helps the parallelization of the tasks and scheduling algorithms can be used. GrADS also supports QoS constraints through estimating the application execution time by use of historical data and analytical modeling. The Kepler workflow system [12] is an open source application that extends the work of the Ptolemy II [13] system to support scientific applications using a dataflow approach. Its main characteristic is that it is based on processing steps, called actors, which have well defined input and output ports. Users are able to define workflows by selecting appropriate actors and connecting them within a visual user interface. A director component holds the overall execution and component interaction semantics of a workflow. The Kepler system provides a various faulttolerant mechanisms, most important of which is the ability to define actors that are responsible for catching exceptions. III. P LATFORM A RCHITECTURE The proposed architecture follows the Cloud Service Model and therefore can be divided into three layers that correspond to the SaaS, PaaS and IaaS. Our solution is mainly concerned with the PaaS layer, which contains services that are responsible for provisioning and managing the execution of real-time services on request of the Application Layer within the IaaS. The service components that make up the application are packaged in Virtual Machine Units (VMUs) and are deployed on a Virtualized Environment. The physical hosts, on top of which the Virtualized Environment runs, make use of a real-time scheduler [14], that allows for a fine grained control of the resources that are made available to each VMU. The services that make up the Platform services include the Application Modelling Tool, the Performance Estimation Service, the Workflow Manager, the Workflow Enactor, the Monitoring service, the Evaluator, the SLA Management System, the Discovery service and the Portal service. The architecture of the platform is shown in Fig. 1. IV. P HASES The process of deploying an application within IRMOS platform and using it can be divided into five distinct phases: (1) Service Engineering, (2) Publication, (3) Negotiation, (4) Execution, and (5) Event Handling. A. Service Engineering Phase In this phase the Application Service Provider models the application in order to be able to be deployed within the Virtualized Environment. Each application is a workflow possibly consisting of numerous services, each of which provides some discrete functionality and is called Application Service Component (ASC). The Application Developer interacts with an Application Modelling Tool which enables the definition of the input and output interfaces of an ASC
Figure 1.
Architecture
as well as the required computing and network resources creating a document called Application Service Component Description (ASCD). Each ASC is thereafter benchmarked and models describing the relationship between different input and output parameters and resource specifications are generated. These are used to create models for the application as a whole, taking into consideration behavioral parameters as well. The models are subsequently used by the system to translate high level application requirements to low level resource requirements in the negotiation phase. During this phase the Workflow Enactor, the Evaluator, as well as the Monitoring service are also taken into consideration and are treated as part of the application. Therefore resource requirements are also generated for them and their execution is guaranteed by the IaaS Provider. This is necessary since the overhead that they produce during the actual execution will also be modelled during the performance estimation stage. B. Publication Phase ASCs, including their binaries and ASCDs, as well as A-SLA templates must be published to the PaaS Provider domain. This takes place at two sub-phases as depicted in Figure 2: 1) ASC Publication: This task implies that the ASCDs have already been created. These ASCDs are then uploaded to the ASC repository, a dedicated repository for storing data related to the ASCs. The ASC publication process also includes the publication of the corresponding ASC binary to the ASC repository. It should be noted that each ASCD also contains a link to the corresponding binary ASC package. The above two processes take place at the same step as depicted in steps 1 to 3 in Figure 2. In more detail, the Application Service Provider uploads a directory of predefined structure that includes both the ASCD as well as the binary, therefore the link to the binary ASC package
Figure 2.
Publication Phase Sequence Diagram
that each ASCD carries, is actually a relative path pointing to the location of the binary inside its directory. These steps are repeated for each ASC and each time the Application Provider obtains a link to the specific location where the information about each ASC is stored in the ASC repository. 2) Application Publication: Having published the ASCs and created the A-SLA templates that correspond to different ways of using the same application (different workflow and/or different level of QoS), the Application Service Provider publishes these A-SLA templates to the PaaS Provider domain by uploading the corresponding descriptive files to the dedicated repository (via the A-SLA Manager), as depicted at steps 4 to 10 in Figure 2. It should be stressed at this point that each A-SLA template builds heavily on the ASCDs and also includes a link to the workflow description of the application it represents. C. Negotiation Phase During this phase the PaaS is in charge of discovering and reserving the resources needed for the execution of an application based on high level QoS parameters passed by the consumer. As shown in Figure 3, the phase begins with the customer requesting a list of available A-SLA templates for a given application. After selecting and filling the ASLA template with specific QoS parameters, the concrete form of the A-SLA template is passed back to the IRMOS Portal for negotiation. The latter checks the extracts the new information from the A-SLA template and contacts the SLA Negotiator if their format is valid. The Negotiation service after storing this information invokes the Discovery Service, which in turn queries the Index Service trying to find available resources that meet the client’s requirements. A list of candidate providers is then returned to the SLA Negotiator, which then contacts the Performance Estimation Service (PES). The high level requirements in the A-SLA
Figure 3.
Negotiation Phase Sequence Diagram
template are transformed by the PES into a request to the IaaS Provider containing low-level requirements for negotiation. If the IaaS accepts, the cost is propagated to the customer. If it also accepted an SLA is signed between the platform and the customer, called Application SLA (ASLA). A different SLA, called technical SLA is signed between the platform and the IaaS provider. Finally, important configuration information about the ASCs is visualized to the customer via the Portal. The consumer is then allowed to use the application within the reserved time-frame. D. Execution Phase The execution phase starts when the consumer logs in to the service-based platform portal, namely IRMOS Portal, and asks for the execution of the application. All the components that comprise the workflow have already been deployed within VMUs. From then on the consumer is able to use the application for the time that the resources have been reserved. We can distinguish between two sub-phases as depicted in Fig. 4: 1) Initialization: In this phase, the platform services are responsible for the initialization of the dedicated components, i.e. Workflow Enactor (WfE) and Monitoring, which reside inside the Virtualized Environment. The sequence
begins with a consumer requesting the execution of an application by invoking the execution operation on the IRMOS Portal. The Consumer provides an A-SLA EPR, which is a unique identifier from a previously negotiated A-SLA and a client ID. The IRMOS Portal filters the input and authenticates the consumer, and afterwards delegates the task to the A-SLA Manager. At the first step, the A-SLA Manager retrieves the information needed for the execution, i.e. the WfE EPR, from the client’s A-SLA and passes it on to the IRMOS Portal. Using the EPR of the WfE resource, the IRMOS Portal activates it and binds it to the specific A-SLA. On the next step, the Workflow Manager activates the dedicated Monitoring Service resources. The latter subscribes the consumer for receiving notifications about monitoring and violations during runtime. Once the consumer’s subscription is complete, the Monitoring Service activates the dedicated Monitoring instance that resides inside the Virtualized Environment. 2) Start Execution: After a successful initialization, the Workflow Manager initiates the execution of the workflow application. It should be noted that prior to the first execution of the application, the WfE configures all ASCs that comprise the application using configuration information from the consumer that is already included in the WfE resource which was created during the negotiation phase. During the execution, the Monitoring Instance draws information about the performance of the ASCs as well as monitoring information coming from IaaS and communicates it to the Monitoring service. Monitoring information and possible violations are visualized to the consumer through the Monitoring web interface which is integrated into the IRMOS portal.
Figure 4.
Figure 5.
Execution phase Sequence Diagram
Event Driven Reconfiguration Sequence Diagram
E. Event Handling Phase This phase covers the possible events that may take place during the execution of an application. Within the IRMOS context, four different cases of events have been identified. These cases may differ as to the source that produces the event and the action that needs to be undertaken by the FS in the aftermath: 1) Migration: This is the case that the IaaS is performing a migration of a VMU to another physical host. This means that the Workflow Enactor may need to reconfigure and restar the ASC. 2) Application specific events: The user is able to define application events in the workflow description document. These can be propagated by the Evaluator to the Workflow Enactor in order for proper action to be taken, as described in the workflow. For example a multimedia application may require a different codec to be used if the number of concurrent users exceeds a predefined threshold. 3) T-SLA violation: This is the case that the IaaS provider notifies the PaaS provider that a breach in the signed TSLA will occur and that the IaaS cannot take any corrective
actions. In this case the customer is notified by the PaaS with the following choices: 1) Continue with the execution even though the QoS is not guaranteed. 2) Terminate the execution. In this case the customer is compensated. 3) Terminate the execution and renegotiate. 4) A-SLA violation: This type of violation is detected by the platform itself and is an indication that the following three events are happening or are about to happen: a. The IaaS is not providing the agreed resources and no T-SLA violation has been. This is treated as described above. b. The application is not functioning as is described in the A-SLA, for example more users than agreed upon are accessing the application. c. An inaccurate estimation was performed by the platform. V. C ONCLUSION Given the need of many applications for real-time interactivity we propose an architecture, targeted towards the Cloud
Model, that provides the functionalities needed to implement and deploy application workflows over a Service Oriented Infrastructure. The two layer approach of our proposal allows for end-to-end QoS delivery, while the ability to specify and handle events makes feasible the enactment of composite workflows, aiming to carry out more complex and interactive applications. ACKNOWLEDGMENT This research has been partially supported by the IRMOS EU co-funded project (ICT-FP7-214777). R EFERENCES [1] T. Erl, Service-oriented Architecture: Concepts, Technology, and Design. Upper Saddle River: Prentice Hall, 2005. [2] B. Hayes, “Cloud computing,” Commun. ACM, vol. 51, no. 7, pp. 9–11, 2008. [3] P. Mell and T. Grance. (2009) The NIST Definition of Cloud Computing, Version 15. [Online]. Available: http://csrc.nist.gov/groups/SNS/cloud-computing [4] B. B. C. Weinhardt, A. Anandasivam and J. Ster, “Business Models in the Service World,” IT Professional, vol. 11, no. 2, pp. 28–33, March 2009. [5] L. Sha, T. Abdelzaher, K.-E. Arzen, A. Cervin, T. Baker, A. Burns, G. Buttazzo, M. Caccamo, J. Lehoczky, and A. K. Mok, “Real Time Scheduling Theory: A Historical Perspective,” Real-Time Syst., vol. 28, pp. 101–155, November 2004. [6] I. Foster and C. Kesselman, Eds., The grid: blueprint for a new computing infrastructure. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1999. [7] T. Oinn, M. Greenwood, M. Addis, M. N. Alpdemir, J. Ferris, K. Glover, C. Goble, A. Goderis, D. Hull, D. Marvin, P. Li, P. Lord, M. R. Pocock, M. Senger, R. Stevens, A. Wipat, and C. Wroe, “Taverna: lessons in creating a workflow environment for the life sciences: Research Articles,” Concurr. Comput. : Pract. Exper., vol. 18, no. 10, pp. 1067–1100, 2006.
[8] T. Fahringer, R. Prodan, R. Duan, J. Hofer, F. Nadeem, F. Nerieri, S. Podlipnig, J. Qin, M. Siddiqui, H.-L. Truong, A. Villazon, and M. Wieczorek, “ASKALON: A Development and Grid Computing Environment for Scientific Workflows,” in Workflows for e-Science, I. J. Taylor, E. Deelman, D. B. Gannon, and M. Shields, Eds. Springer London, 2007, pp. 450–471. [9] I. Brandic, S. Pllana, and S. Benkner, “Specification, planning, and execution of QoS-aware Grid workflows within the Amadeus environment,” Concurr. Comput. : Pract. Exper., vol. 20, pp. 331–345, March 2008. [Online]. Available: http://portal.acm.org/citation.cfm?id=1348685.1348686 [10] F. Berman, A. Chien, K. Cooper, J. Dongarra, I. Foster, D. Gannon, L. Johnsson, K. Kennedy, C. Kesselman, D. Reed, L. Torczon, and R. Wolski, “The GrADS project: Software support for high-level grid application development,” International Journal of High Performance Computing Applications, vol. 15, pp. 327–344, 2001. [11] “The Globus Toolkit.” http://www.globus.org.
[Online].
Available:
[12] B. Ludascher, I. Altintas, C. Berkley, D. Higgins, E. Jaeger, M. Jones, E. A. Lee, J. Tao, and Y. Zhao, “Scientific workflow management and the Kepler system: Research Articles,” Concurr. Comput. : Pract. Exper., vol. 18, no. 10, pp. 1039– 1065, 2006. [13] J. Eker, J. W. Janneck, E. A. Lee, L. Jie, L. Xiaojun, J. Ludvig, S. Neuendorffer, S. Sachs, and X. Yuhong, “Taming heterogeneity - the Ptolemy approach,” Proceedings of the IEEE, vol. 91, no. 1, pp. 127–144, 2003. [14] F. Checconi, T. Cucinotta, D. Faggioli, and S. S. S. Anna, “Hierarchical Multiprocessor CPU Reservations for the Linux Kernel .”