open source architecture for management of large scale service level ...

1 downloads 0 Views 477KB Size Report
ABSTRACT. Service Level Agreements (SLAs) are negotiated agreements of common understanding, between customers and service providers, which define ...
ISBN: 972-8939-03-5 © 2005 IADIS

OPEN SOURCE ARCHITECTURE FOR MANAGEMENT OF LARGE SCALE SERVICE LEVEL AGREEMENTS IN OUTSOURCING Luke Ho and Anthony Atkins Faculty of Computing, Engineering & Technology, Staffordshire University Octagon, Beaconside, Stafford, ST18 0AB, United Kingdom

ABSTRACT Service Level Agreements (SLAs) are negotiated agreements of common understanding, between customers and service providers, which define the service(s) provided along with their expected levels of performance. They help manage the strategic relationship between companies and Outsourcing service providers, hence the importance of SLA management has increased with the growth of Outsourcing in recent years. This paper outlines the application of a dynamic PHPMySQL solution for replacing the manual monitoring of over 1000 SLAs in a large multi-national food and confectionary company. The insourced operations involve large Server Clusters (SCs) and hence downtime results in significant financial impact on the business. In the case of a major SC, 1% off compliance for uptime can typically result in a £0.5m loss per annum. The paper describes how data can be collated, using the developed system, from monitoring server(s) at pre-fixed intervals for comparison against preset performance metrics to determine SLA compliance. The system is structured to give on-demand reports which are tabulated to provide an intuitive view of the server status. Reports generated can be exported to various formats, allowing for flexibility in distribution via Intranet and email applications. KEYWORDS

Service Level Agreement, Management, Outsourcing, Insourcing, Open Source

1. INTRODUCTION A Service Level Agreement (SLA) can be defined as “a written document, typically fairly brief, including a description of service, service goal measurements and the procedures to be followed when the service is unsatisfactory” (Robson, 1997). The HM Treasury Central Unit on Purchasing (1994) defines it as “a negotiated agreement, agreed between the parties, which quantifies the minimum level of service and sets out costs and criteria for delivery”. Incorporating the various definitions available, the authors define a SLA as a negotiated agreement of common understanding, between customers and service providers, which defines the service(s) provided along with their expected levels of performance, and corresponding penalties and consequences for non-compliance. SLAs explicitly define the relationship between customers and service providers (Leff et al., 2003), and can be used in the context of any industry (Verma, 2004). They have been touted as the solution to a range of Information Technology (IT) difficulties, ranging from the discord associated with IT chargeback to the elusiveness of metrics for IT performance (Ross et al., 1997), and are becoming increasingly important for ensuring peak performance of enterprise networks (Muller, 1999). Distinction is made between two types of SLAs, namely the intra-organisational SLA (i.e. SLA for insourced operations) and the inter-organisational SLA (i.e. SLA for outsourced operations), which are briefly outlined as follows: •

340

Intra-organisational SLAs are typically written by the internal service provider utilising baseline performance data gathered from monitoring tools and information gathered from end user requirements and service level expectations (Muller 1999). They provide the internal service provider with a structured approach to service provision and can aid in shifting from a reactive state of incident resolution to a proactive one.

IADIS International Conference e-Society 2005



Inter-organisational SLAs are typically developed from bilateral discussions between the external service providers and the internal end users. They provide a means of clearly defining the expected standard of service levels, corresponding penalties (often financial) for non-compliance, incident escalation procedures, roles and responsibilities of both parties, and more recently an exit strategy in the event that the Outsourcing arrangement fails.

In both insourced and outsourced operations, customers are on a whole looking for reliable, measurable and defined levels of service from their service providers, hence SLAs have gained wider acceptance as a governance tool for monitoring and management of service provision. With increasingly competitive markets introducing further pressures to cut costs, companies are now more inclined than ever to utilise Outsourcing (Ho et al., 2004). This has spurred the continued growth of Outsourcing over recent years and resulted in an increased implementation of SLAs.

2. COMPONENTS OF A SLA Due to the various natures for which a SLA can be applied, the features on individual SLAs may vary. However, at the minimum, an ideal SLA should comprise of the following components: •

• •

• •



• •



Service Definition / Description provides a brief statement or outline, describing the type of service to be provided and qualifications of the type of service to be provided (Verma, 2004), which acts as an overview or introduction to the context in which the SLA is applied. [Example: “The Internet Access service provides the customer with secure, reliable and dedicated access to the Internet in line with current company policies with regards to Internet usage”] Service Hours relates to the times in which the SLA is enforced, which are typically tied in to normal business hours of 9am to 5pm. [Example: “Monday through Friday from 09:00 to 17.00] Availability refers to the degree to which the service or application can be provided to the customer without interruption or degradation, which is usually expressed in terms of percentage over a fixed time period. [Example: “During service hours, the service should be available 99.99% of the time over a period of one year”] Reliability relates to the consistency of the service or application in meeting its pre-defined performance standard, which is usually expressed in terms of number of occurrences over a fixed time period. [Example: “During service hours, the service should not be down for more than 2 times in a month”] Performance Metrics / Measures refer to the various service metrics which are applied to the SLA to ascertain the compliance of performance standard. [Example: “Each user of the Internet Access service should have a bandwidth of at least 64kbps from the data centre to the Internet Service Provider backbone”] Support relates to the degree of assistance provided in the event of incidents and is typically tied in with the business importance of the specific service or application. This is normally expressed as the time taken from failure notification to incident resolution. [Example: “The target resolution time for a Type I incident is 30 minutes”] Backup / Restore procedures cover contingency planning and recovery procedures for restoring the service or application in event of full data loss or severe failure. [Example: “In the event of service failure, the service should be restored in under 4 hours”] Evaluation refers to the frequency of which the SLA is reviewed for appropriateness and necessity for medications to cater for changes in the business environment. [Example: “The SLA will be evaluated bimonthly by a review committee comprised of 2 members from the service provider, 2 members from the customer and a 1 member of a independent consultancy organisation”] Limitations / Constraints cover the restrictions or exceptional circumstances with regards to the SLA’s validity, which details situations out of the service provider’s control that could affect service provision. [Example: “The time duration in which Internet Access service fails due to loss of functionality between global Internet Exchanges, e.g. London Internet Exchange (LINX), will not be taken into account as service downtime”]

341

ISBN: 972-8939-03-5 © 2005 IADIS

• • •



Confidentiality refers to the assurance that privileged information, gained during the course of the relationship between customer and service provider, is not disclosed to unauthorised personnel, processes or devices. Prerequisites relates to the prior condition(s) required or necessary before the service or application can be implemented and is typically expressed in terms of equipment, current setup or business procedures. Financial Charges and Penalties detail the costs related to the provision of the service or provision, typically divided into a one-off implementation component along with an on-going “run and support” component, and financial consequences in the event that the service provider does not meet its obligations. Approval refers to the section for the respective “signatory parties” (Ludwig et al., 2002), both customer and service provider end, to sign off to acknowledge the SLA includes mutually agreeable metrics and accurately details what the end users can expect from the service provider in terms of service provision and performance standards.

3. COMPANY BACKGROUND ACME is a multi-national company with many dealings, primarily in the food and confectionery industry. In a bid to create a single centrally managed organisation for the delivery of IT services to its subsidiary companies, it created three Regional IT Service Centres (RITSCs), which were formed by a fusion of existing staff integrated from the IT departments of its subsidiaries along with newly recruited staff. Each RITSC provides IT services to ACME’s subsidiaries in the individual regions of Americas, Europe and Asia-Pacific. The insourced IT operations are governed by SLAs, which are manually monitored at the respective headquarter office of each region. Typically, staff cost of the monitoring operations at the European headquarters is £0.875m per annum, which amounts to over £1m per annum after taking into account the cost of equipment and administrative services. The busiest Server Cluster (SC) within ACME’s European operations is the EuroX, which handles over 60% of the £1.18bn per annum worth of transactions for the region. As such, EuroX’s compliance to performance standards defined in the SLA is crucial to ACME - a 1% off compliance for uptime can typically result in a £0.5m loss per annum.

4. SERVICE LEVEL AGREEMENT MANAGEMENT SYSTEM (SLA-MS) 4.1 Architecture

Figure 1. SLA-MS architecture

342

IADIS International Conference e-Society 2005

The architecture of the Service Level Agreement Management System (SLA-MS), modelled around the enterprise network structure of ACME, involves a tri-server setup, as illustrated in Figure 1, comprised of the following: • • •

Monitoring Server Database Server Web Server

It is proposed that PHP (recursive acronym for PHP: Hypertext Preprocessor) be used for the web frontend while MySQL is used as the database back-end. Other than the requirement of being XML-capable, the monitoring server can utilise any form of in-house proprietary or commercially sourced software. The PHPMySQL solution requires no code modification for cross-platform deployment and has been successfully deployed in Microsoft Windows (NT/2000/XP), Linux and MacOS X platforms within ACME. PHP applications are also deployable in both the Microsoft Internet Information Services (IIS) and Apache web servers, which allows for implementation flexibility in server architecture planning. The choice of the PHP-MySQL represents an open-source approach, which provides for affordability due to the reduced costs from licensing fees (as compared to commercial software such the Oracle 9i or MS SQL Server databases). This allows the developed SLA-MS to be even deployed in companies that have limited financial resources allocated for IT purposes. As PHP is a server-side scripting language, the PHP webpages rendered by the SLA-MS are not intensive on resources on the client side and thus can be accessed by systems with low resources (e.g. low processor speed and system memory). In terms of MySQL support, there is a variety of options available, ranging from Basic Support plans costing £1,400 (for small-scale operations requiring ad-hoc and occasional support) to the Premier Support plans costing over £40,000 (for enterprise operations requiring round-the-clock 24-7 support). This variable cost support arrangement allows customers to select a support plan as required, within their technical requirements and financial ability, thus facilitating adoption by both Small & Medium Enterprises (SMEs) and large Multi-National Companies (MNCs). MySQL databases have also been noted to be successfully implemented in large-scale operations, such as the Ensembl genome database project and transposonmediated sequencing of cDNA clones (Butterfield et al., 2002; Hubbard et al., 2002), and the latest version (i.e. MySQL 5.0.3 beta) now supports data-processing running across databases from different suppliers (ComputerWeekly, 2005).

4.2 Operation Users initially interact with the SLA-MS via the Management screen, which allows for the creation, updating and deletion of SLAs. A new SLA can be created by the users by entering a combination of descriptive information (e.g. service description) and metrics (e.g. response time) into the web template, akin to filling in a paper-based form. Upon creation of a new SLA, PHP code issues an SQL query, as illustrated in Figure 2, to store the SLA details into the MySQL database and add the specified server(s) or/and server cluster(s) into the monitored list. At pre-fixed intervals, the Monitoring Server collects diagnostic data (e.g. availability of clients, mean application response time and server uptime) from the monitored servers or/and server clusters, which is then transformed from XML (eXtensible Markup Language) format into an SQL (Structured Query Language) query that is executed to store the data into the MySQL located within the Database Server.

343

ISBN: 972-8939-03-5 © 2005 IADIS

Figure 2. Interactions between PHP webpage and MySQL database

Figure 3. SLA-MS report choice screen (left) and sample Excel report (right)

When an SLA report is requested via the web interface, the Web Server requests the relevant data from the Database Server, calculates the related metrics and tabulates them into an intuitive report which can be exported to a file format of the user’s choice as outlined in Figure 3. The SLA-MS currently allows for exporting of reports to the Microsoft Word, Microsoft Excel and Adobe PDF formats for electronic (e.g. email) and online (e.g. corporate web portal) distribution methods. Depending on the file format selected, the report utilises either colour-coding (e.g. green, yellow and red) or emphasis (e.g. bold or italics text) to indicate SLAs’ compliance with their respective performance metrics. The Web Server can be configured for the SLA-MS to utilise Simple Mail Transfer Protocol (SMTP) to forward the report directly to the email addresses of specified recipients.

344

IADIS International Conference e-Society 2005

4.3 Cost Savings Implementation of the PHP-MySQL solution to automatically monitor SLAs could reduce up to 80% of the staff costs involved in the previously manual system. The breakdown of the cost changes is detailed as follows: Table 1. Breakdown of ACME cost savings

Before

After

Change

Staff costs

£875,000

£175,000

- £700,000

Equipment and Administrative costs

£125,000+

£125,000+

Unchanged

Support costs

Unknown

£40,000 [MySQL Premier Support Plan]

+ £40,000

Development costs of the SLA-MS

N/A

£92,400 [1680 man-hours x £55]

+ £92,400

TOTAL

£1,000,000+

£432,400+

£567,600+

The increased support cost arises from ACME’s take-up of the MySQL Premier Support Plan, that provides for 24/7 remote and on-site troubleshooting, which supplements the company’s internal technical expertise with external expert support. As the SLA-MS can be deployed on existing IT facilities, the cost of equipment and administrative services remain unchanged. The combined result is a major cost savings of over 56% for ACME even after taking into account the development costs of the SLA-MS.

5. CONCLUSIONS AND FUTURE WORK The paper outlined Service Level Agreements (SLAs) and highlighted their increased acceptance as a governance tool for monitoring and management of service provision. Inter-organisational and intraorganisational SLAs have been discussed, distinguishing between the characteristics of SLAs utilised for outsourced and insourced operations. SLAs can be used in the context of any industry, and hence the features on one SLA may vary from the next. However, the ideal SLA should comprise of the following components: Service Definition / Description, Service Hours, Availability, Reliability, Performance Metrics / Measures, Support, Backup / Restore, Evaluation, Limitations / Constraints, Confidentiality, Prerequisites, Financial Charges and Penalties, and Approval. In a competitive business, it is imperative to be able to manage the execution of service provision. The implementation of a SLA Management System (SLA-MS) is desirable to facilitate a systematic approach to such management. The authors propose an open-source approach PHP-MySQL solution, involving variable cost components (depending on the level of MySQL support required), which facilitates adoption by both Small & Medium Enterprises (SMEs) and large Multi-National Companies (MNCs). Currently, the developed SLA-MS operates only in a semi-automatic state. Although the collection of diagnostic data is automatic and reports can be generated (along with SLA violations flagged) on demand, the system does not automatically identify such violations in real-time and thus is not considered to be fully proactive in nature. An ideal SLA-MS should predict imminent SLA violations and act proactively (Leff et al., 2003), hence future development efforts will concentrate on the incorporation of Artificial Intelligence (AI) techniques to provide the SLA-MS with the required analytical capabilities to pre-empt imminent SLA violations and issue

345

ISBN: 972-8939-03-5 © 2005 IADIS

advance warnings to service level managers or/and system controllers. Overall, the use of an SLA-MS to replace the manual monitoring of SLAs could result in a 56% (i.e. over £0.5m) cost savings for ACME, which indicates how the adoption of an SLA-MS is beneficial to companies seeking to lower costs and more importantly automate the service level monitoring information.

REFERENCES Butterfield, Y. S. N. et al., 2002. An efficient strategy for large-scale high-throughput transposon-mediated sequencing of cDNA clones, In Nucleic Acids Research, Vol. 30, No. 11, pp. 2460-2468. ComputerWeekly, 2005. MySQL releases beta upgrade for developers, In Computer Weekly, 05 April Engel, F., 1999. The Role of Service Level Agreements in the Internet Service Provider Industry, In International Journal of Network Management, Vol. 9, No. 5, pp. 299-301 Hubbard, T. et al., 2002. The Ensembl genome database project, In Nucleic Acids Research, Vol. 30, No. 1, pp. 38-41 Leff, A. et al., 2003. Service-Level Agreements and Commercial Grids, In Internet Computing, Vol. 7, No. 4, pp. 44-50 Ludwig, H. et al., 2002. A Service Level Agreement Language for Dynamic Electronic Services, Proceedings of the 4th IEEE International Workshop on Advanced Issues of E-Commerce and Web-Based Information, pp. 25-32 HM Treasury Central Unit on Purchasing, 1994. Service Level Agreements, HM Treasury, London, CUP Guidance No. 44 Ho, L. et al., 2004. The Growth of Outsourcing and Application of Strategic Framework Techniques, In CD-ROM Proceedings of the European and Mediterranean Conference on Information Systems (EMCIS) 2004, Tunis, Tunisia Muller, N. J., 1999. Managing Service Level Agreements, In International Journal of Network Management, Vol. 9, No. 3, pp. 155-166 Robson, W., 1997. Strategic Management & Information Systems, 2nd Edition, Pitman Publishing, London, ISBN 0-27361591-2 Ross, J. W. et al., 1997. Service Level Agreements at Texas Instrument: Theory Meets Practice, In Proceedings of the 18th International Conference on Information Systems (ICIS) 1997, Atlanta, United States, pp. 527-528 Verma, D. C., 2004. Service Level Agreements on IP networks, In Proceedings of the IEEE, Vol. 92, No. 9, pp. 13821388

346

Suggest Documents