An Adaptive Monitoring Framework for Ensuring ... - IEEE Xplore

19 downloads 17683 Views 749KB Size Report
An Adaptive Monitoring Framework for Ensuring. Accountability and Quality of Services in Cloud Computing. Hassan Mahmood Khan. Faculty of Computing and ...
An Adaptive Monitoring Framework for Ensuring Accountability and Quality of Services in Cloud Computing Hassan Mahmood Khan

Gaik-Yee Chan*, Fang-Fang Chua

Faculty of Computing and Informatics Multimedia University Cyberjaya, Malaysia

Faculty of Computing and Informatics Multimedia University Cyberjaya, Malaysia To gain these benefits, there should be trust between the Cloud Service Consumer (CSC), SaaS Provider (SP) and Cloud Facilitators (CF). In other words, the CF is accountable for providing efficient and effective services to the SP. Similarly, the SP is accountable for providing efficient and effective services to the CSC. As such, there should be a binding agreement on services provided between the SP and CF, and between the SP and CSC. This binding agreement is known as the Service Level Agreement (SLA). Refer to Figure 1 for further details.

Abstract—Cloud computing platform has gained popularity among service providers and consumers to perform business operations due to the ease of communication and transaction convenience in terms of accessibility and availability. However, due to the vulnerability of this dynamic open environment, it is crucial to have a binding agreement between all the service parties for ensuring trust while fulfilling the expected Quality of Services (QoS). There is a need to improve on the current Service Level Agreements (SLAs) practice which does not focus on the QoS and accountability assurance. In this paper, we propose an adaptive monitoring framework to dynamically monitor QoS metrics and performance measures to verify compliances to the respective SLAs. The framework is validated with scenarios on response time and availability which shown to provide adaptive remedy action to rectify violation situation. Besides, any service party which establishes non-compliance to SLAs shall be penalized in monetary terms.

As shown in Fig. 1, whenever a CSC uses a Cloud service such as an e-commerce application provided by the SP, the CSC expects the SP to provide the service according to the promised quality services specified in the SLA. For example, CSC expects the service to be available 99% of the time, i.e. 24/7 with response time for each request to be within 300ms. Likewise, the SP also expects the CF who provides the infrastructure and platform services to provide quality services specified in the respective SLAs. For example, the CF promises to provide 500GB of database storage over a period of one month for the SP’s e-commerce application usage. Therefore to ensure such promises are realized, there should be dynamic mechanism to monitor and verify QoS metrics together with performance measures. Additionally, for building accountability among the three parties in the event of SLA violation, there should be adaptive remedy action to rectify the situation by the respective party or impose penalty in monetary terms in case the low performance in service could not be rectified.

Keywords—accountability; quality of service; service level agreement; software as a service

I.

INTRODUCTION

Cloud computing has emerged as an important platform for companies to provide their services which range from infrastructure as a service (IaaS), platform as a service (PaaS) or software as a service (SaaS). Cloud computing has positioned new computing paradigm and provided new business models that enable new demand provisions and storage resources [1]. This new paradigm is based on the motive for computing resources on pay-per-use basis [2]. Web or Web Service-based applications, especially e-commerce applications, enable end users to trade product or service online and serve as SaaS in the Cloud environment. The interdependent relationship between the Cloud service consumer, the SaaS provider and the Cloud facilitator together with the new paradigm have enabled e-commerce activities and trading to be done effectively and efficiently. Therefore, small or medium enterprises are very responsive in deploying ecommerce applications over the Cloud environment so as to tap the benefits of multi-tenancy with reduction in time and cost together with improved efficiency and effectiveness for trading online [3].

Fig.1. The Binding Agreements in Cloud Computing

*

Corresponding author: Tel: +603 8312 5215 Email: [email protected]

978-1-5090-1724-9/16/$31.00 ©2016 IEEE

249

ICOIN 2016

However, studies from [4, 5, 6, 7, 9, 10] and practices from the service providers mentioned in [8] show that current SLA does not focus much on quality of service (QoS) and accountability assurance. Service availability is the only main criterion that is assured and penalties are imposed on this basis. Other QoS parameters including performance is not much considered. Further discussion on this issue could be found in the following Section II.

specified functionality, such as e-commerce application. Generally, this problem exists due to high-level functional requirements are lacking in comprehensive and stringent service level guarantees as mentioned in [7]. Most often, agreements are made using non-negotiable standard contracts which mainly deal with protecting the rights of the CF or SP with CSC being neglected [7]. This minimizes the CF and SP’s liability without supporting the CSCs in case of dispute or service discrepancy, hence, causes ‘distrust’ among the three parties. Therefore, in this emerging Cloud services industry, SLA is going to be the key document for building the trust and accountability among the three parties.

In this paper, a framework to dynamically monitor QoS metrics and performance measures to verify compliances to the respective SLAs is therefore proposed. This proposed framework shall detect SLA violations and adaptively perform remedy action to rectify low performance services. To ensure accountability as depicted in Fig.1 and explained in Section I, in the event that low performance services could not be rectified, penalty in monetary terms are also imposed.

A study conducted by researchers in [8] to compare SLAs of Amazon, Rackspace, Microsoft Windows Azure, Terremark vCloud Express, and Storm on Demand has found that the CFs mainly offer IaaS or PaaS’s SLAs but do not focus on SaaS’s SLA. These IaaS or PaaS’s SLAs only focus on availability or request completion rate and lacking in other measurements and monitors such as service guarantee, service guarantee time period and granularity, service violation detection and credit. There is also a suggestion to standardize the SLAs so as to allow the CSCs to make comparisons. From the CSCs’ perspective, it is assumed that through SLA, they are guaranteed with quality of services [9]. However, with the lack of standardization in SLA, measurement and monitoring of QoS metrics, this guarantee is hard to be realized. This, thus, reflects an urgent need to design a SLA on the basis of SaaS specific attributes and standards that can measure and monitor the performance of SaaS, specifically e-commerce applications on one hand, and on the other hand, make SPs accountable for their services in case of deficiency in the promised services.

This paper is organized as follows: Section II discusses the background and related studies, Section III describes the proposed framework and Section IV describes the core functions of the framework with scenarios and Section V concludes and discusses future work. II.

BACKGROUND

Migration from the traditional platform to the Cloud platform no doubt has enabled enterprises to gain many benefits. However, at the same time, the assurance for providing accountability and quality services based on SLA has also created constraint in making the migration decision [4]. Study in [5] has identified many Cloud related issues that need to be considered by Cloud service providers for successful implementation. These issues range from ethical, security, legal and jurisdictional, data lock-in to technology bottleneck. The lack of standardized SLA, performance measurement and QoS are also key issues and challenges to ensure delivery of agreed services. Furthermore, research in [6] presented QoS attributes which are categorized into performance, dependability, trust and cost, notice the attributes performance and trust appear again! The issues related to SLA, QoS and performance monitoring and measurement are discussed in details in the following sections.

These shortcomings of SLA such as SLAs mainly focus on technical attributes and not on security and trust management aspect of the services; SLAs do not define violation of service, remedy action for violation and the cost of violation; SLAs are not comprehensive for CSCs to understand and so on are also mentioned in [10]. B. Quality of Service (QoS) Generally, SLA should contain a set of agreed services provided by the CF and SP. Usually, to quantify the agreed service level, QoS metrics and the agreed minimum or maximum values are used. Through literature reviews, some common examples of QoS attributes are identified and listed in Table I.

A. Service Level Agreement (SLA) Service Level Agreement (SLA) is a way to formally document the performance, responsibilities and limits of service(s) between the CF, SP and their CSCs. Key attributes of the SLA should include availability, serviceability, performance, operations, billing, and perhaps penalties associated with non-conformance of agreed performances. In providing Cloud services, unavailability or under-performed services are the main disputes. A comprehensive SLA, therefore, would reduce the risks of possible future conflicts or disputes. However, the implementation of SLA would be challenging if the performances, for example, measurement and monitoring on the services of the CF and SP could not justify the requirements of the SLA [7]. This problem exists significantly for SaaS provider who provides services with

TABLE I. List of Common QoS Attributes

250

in SLAs. However, capability of this framework is yet to be tested in the Cloud platform.

As seen from Table I that accountability or remedy action for service violation, an important parameter that determines the trust relationship between the service providers and consumers is not included. This shortcoming is also mentioned in [10] among other shortcomings such as the CSCs are not provided with the flexibility to determine what QoS parameters are most suitable for them; CSCs are also not given the opportunity to dynamically modify any QoS requirements according to their preferences and so on.

Research in [14] introduced a framework whose approach is based on mosaic: Open source api and platform for multiple clouds (mOSAIC). This framework consists of a monitoring and warning system capable of monitoring Cloud resources and application components. Additionally, it will verify the compliance of a service through a set of rules, hence discover warning conditions. However, performance evaluation of the framework is yet to be determined in the future.

C. Performance Monitoring and Measurement For trustworthy Cloud services, effective mechanism for monitoring performances and detection of SLA violations are necessary. This is to increase the end user’s trust level towards the Cloud service providers.

From these studies, it can be seen that there is still room for further research to improve on the technique for monitoring QoS, detection of SLA violation with remedy action and to ensure accountability among CF, SP and CSCs. Motivated by this fact we therefore propose an adaptive monitoring framework for ensuring accountability and QoS in Cloud services.

In view of this, study in [6] presented a framework that combines the advantages of client- and server-side QoS monitoring. The framework uses event processing to inform users the current QoS values and possible violations of the SLA. These events then trigger adaptive behavior such as hosting new service instances if the QoS is not as desired. However, to automatically react to SLA violations, such as deploying new service instances on-the-fly or dynamically increase certain virtual machine capabilities are yet to be implemented in the future.

III.

OUR ADAPTIVE MONITORING FRAMEWORK

Our proposed framework consists of three main components namely, component for digitization of SLA parameters; interactive components for dynamic monitoring of QoS and core component for dynamic detection of violation and adaptive remedy rectification. Refer to Fig. 2 for further details.

Another study in [11] presented a Business Process Execution Language, BPEL-based monitoring framework for monitoring Web services in the Cloud environment. This framework collects information from the Cloud, analyzes the information and then takes corrective actions when SLA violations are detected. The framework’s run-time monitoring is based on workflow patterns as composed in BPEL. However, this framework is limited to monitoring the response time QoS requirement only. The study in [12] introduced a SLA-aware-Service (SLAaaS) Cloud model that integrates QoS and SLA into the Cloud services. Their experiments on online Cloud services through various case studies have successfully demonstrated the capability of the model to provide request response time, availability, resource usage and resource cost guarantees. Other metrics such as service throughput and energetic cost are yet to be evaluated in the future.

Fig.2. An Adaptive Monitoring Framework for Trusted Cloud Services

A cloud application SLA violation detection model, CASViD, is proposed in [13]. This model uses resource allocation, scheduling, and deployment tools to monitor and detect SLA violations at the Application Layer. Being tested in a real Cloud environment, the model has demonstrated its capability in monitoring, detecting SLA violations, and suggesting effective measurement intervals for various workloads, but efficient only with a single application SLA violation situation. More experiments are yet to be conducted on multi-tier applications in the future.

The following sections further describe each component in details by referring to Fig.2. A. Digitization of SLA Parameters A set of defined QoS parameters are stored in a repository with agreed values according to the service slab. These values are defined for each parameter according to SLA. Through simulation of these defined values with different client environments, cloud resources, memory variances, network bandwidth and connections, threshold value for each QoS parameter could be determined. By formulating the maximum and minimum or threshold values based on different configurations provides more flexibility in defining and monitoring the SLA’s parameters.

The study in [4] proposed a QoSAECC framework that could simultaneously monitor and dynamically analyze the QoS attributes such as security, performance, timeliness, throughput and reliability as promised by the service providers

251

C. Core Component The core component gathers all QoS metrics from the interactive components and then evaluates these values to check for SLAs’ violations. In order to reduce overhead in computation of these values, not all QoS metrics are necessarily measured in real-time. Some of these values are measured over a period of time due to the different nature in computation of these metrics. For example response time is measurement that could be captured at a certain point in time by computing the difference in time (expressed in milliseconds, ms) when the service request is completed and when the service request is obtained. Thus, it could be measured frequently and continuously. However for computation of availability metrics (expressed in percentage, %), it needs to consider the downtime (mean time to repair) and uptime (mean time to failure) which could only be measured over a period of time interval.

B. Interactive Components There are four units which contain QoS attribute values that are interactive in nature. These values are interactive in the sense that they are continuously gathered from the SLA’s repository and dynamically obtained from the environment or applications. These values are to be forwarded to the core component for further evaluation to detect SLA violation, performance monitoring to prevent SLA violations or implementation of remedy action to rectify violation. • SLA agreed attribute values All the QoS attributes values and service slabs established according to SLA are considered as the base or benchmark for the specified Cloud service. • SaaS QoS attribute values Real-time QoS attribute values based on SaaS or Web Services (WS) are essential for service management and composition. In WS, requests are sent and received using communication protocols (e.g., HTTP, SMTP) or messaging models. The communication protocols use packets for data transmission and messaging model requires parsing intensive XML data. It becomes even more important when WS involves more complex processes or interaction with other services. Monitoring, therefore, helps to identify the health of a service to ensure QoS by detecting signs of failure in real time. This thus helps to ensure service availability, throughput and latency requirements. Even though real time monitoring mechanism could precisely reflect the performance and behavior of a WS, however, it also puts significant burden of resources or overhead on the system. Therefore, to avoid high overhead, a configurable monitoring approach that monitors QoS metrics depending on urgency is proposed. For example, availability measurement could be computed over an interval of time and not instantly at a point in time such as response time.

IV.

SCENARIOS FOR RESPONSE TIME AND AVAILABILITY

In this section, we make use of a few scenarios to step through the core component of the framework and explain how dynamic monitoring of QoS metrics and performance measures, detection of violation and adaptive remedy actions for violations could be carried out. Two QoS attributes namely response time and availability are used to assist in the explanation for the process flow through the core component. Refer to Table II and Table III for scenarios related to response time and availability respectively. As shown in Table II, Columns 2-3, the SLA agreed response time between the CSC and SP is 300ms and the predefined threshold obtained through simulation is 280ms. For scenario#1 where the dynamically monitored response time is 250ms, much lower than the agreed and threshold time, there is no issue as there is no violation of SLA. For scenario#2, the dynamically monitored response time of 290ms is higher than the threshold but still lower than the agreed time. Although there is no SLA violation, but the system will display a warning to alert for further investigation on the cause of higher than threshold performance to prevent violation in a later stage. Scenario#3 shows a violation is being detected since the monitored response time of 380ms is much higher than the agreed response time. This violation is due to excessive user load where agreed load is 1,000 but actual load is 2,000. One of the remedy actions that could be performed by the SP is to increase the memory and/or the number of processors. Since this violation is caused by the users, penalty in monetary terms shall be imposed on the CSC. Similarly in scenario#4, there is a violation where the response time is 380ms, much higher than the agreed time of 300ms. In this case, the violation is due to SP’s internal service error. The Cloud service or SaaS has to be shut down for trouble shooting, repair and recovery. The SP is accountable for this violation and thus, penalty is imposed.

• Cloud facilities QoS attribute values Low level metrics based on facilities providing infrastructure and platforms such as service throughput and scalability are to be captured dynamically to assist in SLA violation detection as well. • Configuration setting How efficient are the services provided by SaaS, IaaS and PaaS are very much dependent on the environment related attributes such as hardware, memory, storage, virtualization facilities, networks bandwidth and so on. Any variances to these QoS attribute values may have impact to the services provided by the Cloud facilitators. Thus these values are to be monitored and checked for SLA’s violations as well.

252

TABLE II.

REFERENCES

Scenarios for Response Time [1]

P. Mell and T. Grance, The NIST definition of Cloud Computing, National Institute of Standards and Technology, US Department of Commerce, Special Publication 800-145. http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf (Retrieved on June 2015) [2] P. Aghera, S. Chaudhary, and V. Kumar, “An approach to build multitenant SaaS application with monitoring and SLA, “ 2012 International Conference on Communication Systems and Network Technologies, Gujarat, India, 11-13 May 2012, pp. 658–661. [3] F. Shaikh and D. Patil, “Multi-Tenant E-commerce based on SaaS Model to Minimize IT Cost,” IEEE International Conference on Advances in Engineering & Technology Research, Unnao, India, August 1-2, 2014, pp. 1-4. [4] W.C.C. Chu, C.T. Yang, C.W. Lu, C.H. Chang, N.L. Hsueh, T.C. Hsu, and S. Hung, “An Approach of Quality of Service Assurance for Enterprise Cloud Computing (QoSAECC),” 2014 International Conference on Trustworthy Systems and Their Applications, Taichung, Taiwan, 9-10 June 2014, pp. 7–13. [5] A. O. Akande, N. A. April, and J. Van Belle, “Management Issues with Cloud Computing,” in Proceedings of the Second International Conference on Innovative Computing and Cloud Computing, Wuhan, China, December 1-2, 2013, pp.119-124. [6] A. Michlmayr, F. Rosenberg, P. Leitner, and S. Dustdar, “Comprehensive QoS monitoring of Web services and event-based SLA violation Detection,” The 4th International Workshop on Middleware for Service Oriented Computing, 30 Nov.–4 Dec. 2009, Illinois, USA, pp. 1-6. [7] J. Meegan, G. Singh, S. Woodward, S. Venticinque, M. Rak, D. Harris, and G. Malekkos, A Practical Guide to Cloud Service Level Agreements version 1.0. Cloud Standards Customer Council, 2012, pp. 1–44. [8] S. A. Baset, “Cloud SLAs: Present and Future,” ACM SIGOPS Operating Systems Review, vol. 46(2), July 2012, pp. 57–66. [9] L. Wang, G. Von Laszewski, A. Younge, X. He, M. Kunze, J. Tao, and C. Fu, “Cloud computing: A perspective study,” New Generation Computing, April 2010, vol. 28(2), pp. 137-146. [10] M. Alhamad, T. Dillon, and E. Chang, “A survey on SLA and performance measurement in cloud computing,” Lecture Notes in Computer Science, vol. 7045(2), pp. 469-477, 2011. [11] R. Grati, K. Boukadi, and H. Ben-Abdallah, “A QoS Monitoring Framework for Composite Web services in the Cloud,” The Sixth International Conference on Advanced Engineering Computing and Applications in Sciences, Barcelona, Spain, 23-28 September, 2012, pp. 65-70. [12] D. Serrano, S. Bouchenak, Y. Kouki, T. Ledoux, J. Lejeune, J. Sopena, L. Arantes, and P. Sens, “Towards QoS-oriented SLA guarantees for online cloud services,” The 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, Delft, the Netherlands, 13-16 May 2013, pp. 50–57. [13] V. C. Emeakaroha, T.C. Ferreto, M.A.S. Netto, I. Brandic, and C.A.F. De Rose, “CASViD: Application Level Monitoring for SLA Violation Detection in Clouds,” IEEE 36th International Conference on Computer Software and Applications, 16-20 July 2012, pp. 499-508. [14] M. Rak, S. Venticinque, T. Mahr, G. Echevarria, and G. Esnal, “Cloud Application Monitoring: The mOSAIC Approach,” 2011 IEEE Third International Conference on Cloud Computing Technology and Science, Nov. 29 – 1 Dec. 2011, pp. 758-763.

As shown in Table III, Columns 2-3, the SLA agreed availability over a time period, for example 24 hours, between the SP and CF is 99.00% and the pre-defined threshold is 99.50%. For scenario#1, the dynamically monitored availability is 99.40%, although lower than the threshold, but higher than the agreed. Therefore, there is no violation of SLA occurred. For scenario#2, the dynamically monitored availability of 98.40% is much lower than the threshold as well as the agreed. This indicates a violation has occurred and it is due to CF’s internal server error. The CF has to shut down the Cloud services for further trouble shooting and repair. Penalty is thus imposed on the CF. Scenario#3 also shows a violation is being detected as the monitored availability of 90.00% is much lower than the agreed one. This violation is due to Denial of Service (DoS) attack caused by security flaws in the e-commerce application. The SP has to shut down the Cloud service for further investigation or repair. Penalty is thus imposed on the SP who is accountable for the violation. TABLE IIB.

V.

Scenarios for Availability

CONCLUSION AND FUTURE WORK

Our proposed framework caters for accountability for all the three parties who consume and provide the Cloud services. By dynamically monitoring QoS metrics and performance measures, our framework is able to detect whether SLAs’ agreed terms are being violated. Additionally, when violation occurs, our framework provides adaptive remedy action to rectify the situation. To further ensure trustworthy services, any party with non-compliance to SLAs shall be penalized in monetary terms. In this paper, we demonstrated the validity of the framework by stepping through it with scenarios on response time and availability. Our future work is to implement the framework in a Cloud platform using an e-commerce application as SaaS. Through simulation and experiments on performance evaluation of the framework with the e-commerce application over the Cloud platform, the effectiveness and efficiency of the framework can thus be measured.

253

Suggest Documents