Policies, Grids and Autonomic Computing Bradley Simmons
Hanan Lutfiyya
Department of Computer Science The University of Western Ontario London, Ontario, Canada N6A 5B7
Department of Computer Science The University of Western Ontario London, Ontario, Canada N6A 5B7
[email protected]
[email protected]
ABSTRACT
organisational implementation and human interaction with the software and hardware. Such grid resources typically span location and even organisational boundaries. Autonomic computing refers to systems that are self-protecting, self-healing, self-optimising and self-configuring. In fact, it is a term that conveniently summarises many of the common aims of contemporary software development. Although much of the autonomic computing vision has been defined by IBM [7], other major players have suggested similar concepts [10].
The goals of resource management fall within the overall aims of autonomic and grid computing, namely the sharing of resources automatically, and the allocation of resources depending on both application and business needs. Resource allocation can be guided by policies which encapsulate decisions made by the management system. Policies can be used to encapsulate many different types of management decisions including possible corrective actions when a performance requirement of an application is not being satisfied and actions to take place when there is more demand then supply. System policy is derived from the interactions between Service Level Agreements (contractual agreements between businesses) and locally specified management rules. This paper explores the potential use of mathematical models (e.g., optimisation models) for relating the various types of policies. It describes the current and proposed work in applying policies to resource management in the context of autonomic and grid computing systems.
The concepts inherent to grid computing were initially used in numerically intensive scientific computing applications e.g., [1]. Recently there has been the desire to develop grids for commerce, based on the premise that a grid enables more efficient sharing and utilisation of computing power and storage. It has been estimated that commercial computing resources reach only between 10% and 20% utilisation of their full potential [14]. IBM and Charles Schwab’s Advanced Technology Group recently announced that it was able to reduce the processing time of an existing financial application from more than four minutes to 15 seconds by grid enabling it with the Globus Toolkit running RedHat Linux on IBM xSeries 330 machines [18]. Another key aspect that is addressed in the grid realm is the dynamic nature of computing resources needed by applications. One approach for ensuring the satisfaction of the computing needs of a particular application is to provide enough resources to meet anticipated peak demand. Typically this will lead to paying for unused resources, and is not a good solution. For example, if an enterprise needs n web servers 95% of the time, with 3n servers required the other 5% of the time, a better solution than having three times the number of servers as typically needed may be to use computing and data resources from third party providers as needed, or to use surplus capacity for service provisioning.
Categories and Subject Descriptors C.4 [Computer Systems Organization]: Distributed SystemsPerformance of Systems[Modeling Techniques]
General Terms Autonomic Computing
Keywords Policies, resource management, optimisation
1. INTRODUCTION There are a number of concepts embodied in the terms grid computing [13] and autonomic computing [7] that are at the core of recent developments in computer science. Grid computing refers to the use and coordination of dispersed computer resources that includes (i) hardware, such as servers, workstations, desktops, and storage; (ii) software such as reusable components and licences; and (iii) wetware, the
The allocation and scheduling of computing resources is known as resource management, hereafter referred to as management. The goals of management fall within the overall aims of autonomic and grid computing, namely the sharing of resources automatically, and the allocation of resources depending on both application and business needs. Resource allocation can be guided by policies which encapsulate decisions made by the management system. Policies can be used to encapsulate many different types of management decisions including possible corrective actions when a per-
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage, and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. DEAS 2005, May 21, 2005, St. Louis, Missouri, USA. Copyright 2005 ACM 1-59593-039-6/05/0005 ...$5.00.
1
formance requirement of an application (which will be referred to as a service level objective ) is not being satisfied and actions to take place when there is more demand then supply.
all of the demand e.g., application ai has higher priority then aj ; (5) An authorisation policy is used to determine if a specific action can be authorised by a specific initiator in response to a violation of the service-level objective being violated. (6) Other policies can be used to determine the cost of not satisfying an application’s resource demands and with determining the profit to be made by a specific resource configuration.
This paper briefly describes the current and proposed work in applying policies to resource management in the context of autonomic and grid computing systems.
Formulating policies is further complicated if it is assumed that different applications share components e.g., aij = amn and hence it may be feasible to allow akij , almn to refer to the same process. An example of such a component is that of a database server.
2. POLICIES A grid can be defined as G = (H, L) where H = {h0 , ..., hn } represents the set of hardware resources and L = {l0 , ..., lm } represents software resources. In this research the software resources are software licences. The set of applications is denoted by A = {a0 , ..., ar } and the jth component of application ai is denoted by aij . An instantiation of the jth component of application ai is denoted by akij (k is assumed to uniquely identify the instantiation). A resource configuration is defined as an allocation of resources to application instances RC =< Assignh , Assignl > where Assignh = {(akij , hq )|ai ∈ A, hq ∈ H} and Assignl = {(akij , lq )|ai ∈ A, lq ∈ L}.
Based on this initial analysis we have the following observations: (1) The types of policies that may exist come from different resources. For example, a service-level objective arises from the interaction between the provider and the consumer of resources. A resource conflict policy is associated with the goals of the provider. Configuration policies often depend on the application; (2) Being able to use one language for specification of all policies may be difficult since the policies are from different sources and at different levels of detail; (3) The policies are not independent of each other. A model is needed that represents the potentially complex relationships among them.
The value of RC depends on policies. The term policy is often interchanged with terms such as goal and objective [9, 22]. Goh [8] provided these definitions: An objective is a description of what is to be achieved at a high level, a low level mechanism for achieving specific measurable results is referred to as an implementable and a policy is a description of the constraints imposed to achieving an objective. In this paper, each of these is referred to as a policy.
3.
UNDERSTANDING AND FORMALISING RELATIONSHIPS BETWEEN POLICIES
The relationships between the policies are important as well since implied decisions may have interdependencies. For example, an instantiation of application, ai , may require five machines and an instantiation of application aj may require six machines. The demand is for eleven machines but there may only be seven. An obvious allocation is to provide ai with five machines and provide two machines to aj . However, the cost of not satisfying the demands of application aj may outweigh the benefit of satisfying aj ’s request. Configuration policies may further restrict the allocation of resources to applications.
An initial analysis suggests that there are many kinds of policies including the following: (1) A service-level objective policy associated with an application refers to nonfunctional, run-time requirements that may be defined with respect to response time, throughput or availability, e.g., the average response time for requests to an application should be less than x seconds; (2) A system-level objective policy refers to requirements associated with the resource configuration. Examples include the following: the cost associated with the violation of an application’s service level objectives should be minimised and the profit should be maximised; (3) A configuration policy specifies restrictions on the placement of application instances, and the minimum and maximum number of instantiations of an application allowed. Examples include the following: (i) There should not be more than l instances of ai ; (ii) A variation of (i) is that if the number of instantiations of ai is greater than l then the cost associated with an instantiation changes to a certain value; (iii) The “distance” from one component of ai to another component in any instantiation should be no more than d. For example, let ai be an application consisting of two components: ai0 which is a web server and ai1 which is a database server. Since the web server is expected to heavily interact with the database server the number of hops between the web server and the database server should be minimised. (iv) An application component aij should run on a machine with a large amount of disk space. For example if aij is a database server then it should only execute on machines with a certain amount of disk space; (4) A resource conflict policy specifies constraints on resource allocations when there are not enough resources to satisfy
We will examine the use of optimisation models and controltheoretical models for relating policies. We note that we do not believe that all policies can be related to each other using these models. We are not the first ones to identify the need to relate the different policies. Work found in [3] emphasises the importance of determining the relationships between the client objectives and the existing resource infrastructure. However, the current focus is on translating performance requirements to the number of nodes in the cluster to be used. Currently the prototype does not reflect the use of most of the policies described in this section.
3.1
Optimisation Models
This section illustrates how policies can be mapped to an optimisation model. Let xijk be a variable that is equal to one if aij is instantiated on hk . Otherwise it is equal to zero. Let yijk be a variable that is equal to one if aik is instantiated on hj and assigned a licence lk . Otherwise it is equal to zero. If the application component aij is not allowed to execute on hk then xijk = 0 and yijk = 0 are constraints in the op-
2
real-time while in the former it may not always be necessary to consider a real-time solution.
timisation model. If the minimum number of instantiations of Pnan application component aij must be less than t then k=0 xijk < t is a constraint in the optimisation model. If the maximum number of instantiations P of an application aij must be greater than t then t − n k=0 xijk < 0. If two applications have components that are to be shared e.g., a database server then this also must be formulated as a constraint. For an application component, aij , its resource demands can be represented by dij =< dijH , dijL >. This can be estimated based on the service-level objectives. Assume that the system-level objective is to minimise the cost associated with violating service-level objectives. A cost function fC is defined that given an application’s demand and supply computes the cost of not satisfying the application’s servicelevel objectives. For an application ai cost is computed as P ci = f C ∗ n function of the j=0 (xijk − dijH ). The objective P optimisation problem is to minimise ri=0 ci . A function is also needed for profit. We define a profit function fP that given an application’s supply computes the profit. P For an application ai profit is computed as pi = fP ∗ n j=0 xij . P r Thus an objective function could be to minimise (i=0 pi Pr + i=0 ci ).
The work most closely related to this work is being carried out at HP Laboratories in Palo-Alto (e.g., [21]). The work differs as follows: (i) Our proposed work includes a detailed and systematic study of policies. Already the initial analysis conducted has examined policies more complicated than those described in [21]. The policies they examined focussed primarily on resource conflict policies and simple configuration policies that essentially state wether an application component can be migrated or not. Their work does not take into account licencing issues nor any of the other more complex types of configuration policies as discussed earlier; (ii) We propose to develop algorithms that will use the structure of the problem to enable the algorithms to need less than exponential time and allow changes in policies to not necessarily mean that computations have to start from scratch; We have a good deal of experience in this [2]. So far this has not been addressed by other researchers.
3.2
This formulation of this optimisation model is based on a 0-1 Integer Linear Programming (ILP) model. There are potentially a number of different algorithms that either provide optimal solutions or approximated suboptimal results. Popular methods used include simulated annealing, genetic algorithms, branch and bound and cutting plane. In previous work [2] it was shown that a generic branch and bound algorithm that takes advantage of structural insights of a specific problem will often find an optimal solution in less than exponential time. This was applied to the problem of efficiently configuring management agents based on a set of system and management requirements. If a system or management required changed, it was not necessary to start computation of a new configuration from scratch. This suggests that this can be applied to the optimisation model developed that if any of the policies change, it is not necessary to start computation from scratch. We will also look at a technique based on Markov Chains [5].
Control-Theoretic Models
The use of control theory principles will be explored. This approach uses functions that represent the output of a system. A function is used to calculate the value of an output control parameter. A function is periodically computed based on monitored data. The value computed is compared to a reference value in order to calculate an error value which is the difference between the reference value and the value computed. Tuning variables are adjusted based on the error. In the environment being considered, system-level objectives can be used to derive a function representing an output control parameter. Thus, a function could be the cost of violating service-level objectives. A tuning parameter could be the number of a type of computing resource allocated to a specific application. The adjustment of the tuning parameter is an action that can be taken. The adjustment may depend on resource conflict policies. This would not represent the first work in applying control theory principles for computer systems. Examples can be found in [4, 6]. We believe that the application of control theory principles in grids is new. We do not anticipate the use of a feedback loops for all of the grid. Rather there will be multiple feedback loops whose interaction must be controlled.
It should be noted that depending on the polices, it may not be necessary to use 0-1 ILP. For example, if there are no policies that specify restrictions on the location of application instances then it is possible to use a linear programming model which is not NP-complete. The research will examine these issues: (1) A more detailed study of policies will be conducted in conjunction with our industrial partner; (2) Development of optimisation models based on taking into account different kinds of resources e.g., storage and the fact that an application is distributed. The latter implies that dependencies between application components causes network bandwidth to be included as a resource. We should be able to draw upon the experiences of other researchers (e.g., [12]); (3) Algorithms for solving optimisation models will be studied and evaluated based on their responsiveness to changes in policies. The design of algorithms will rely on whether or not resource re-allocation is considered in regular intervals or is dynamic as the result of too many violations of service-level objectives. In the latter case, the heuristic must provide a good solution in
3.3
Other Policies
Not all policies will be mapped to the models described in sections 2.2.1 and 2.2.2. For example, authorisation policies will often be dynamic since the privileges of a user is dynamic. Although it may be possible to model this in the optimisation model the dynamic nature implies that the model constantly changes and the software that computes a resource allocation would have to be recomputed which is not feasible in an environment where a solution is required in real-time. It is important to determine which policies should be mapped to a model and the relationship between those policies and the policies not mapped to a model.
3
4. POLICY SPECIFICATION, TRANSFORMATION AND DISTRIBUTION
icy decision point (PDP) where management decisions are made. An associated policy repository and rule-base/ evaluation engine is used to make decisions based on monitored data; (2) The Policy Enforcement Point (PEP performs the actions as specified by the PDP. There can be multiple PEPs
So far, we have described the relationship between policies in terms of optimisation and control-theoretic models. These models could be rather difficult for administrators to deal with. Ideally, a policy is written using a language that makes it easy for administrators to specify policies. These policies must then be mapped to a form (e.g., mathematical model) understood by management processes. There has been considerable work (e.g., [16, 19, 15]) in network management where policies are mapped to network device configurations that are then sent as instructions to management processes for configuring network devices. This work primarily focusses on network devices and primarily considers mappings to configurations to a single network device. Our work differs in that we are examining mappings from policies to a mathematical model that will be evaluated by management processes. We then must take the results of the mathematical models and translate this to instructions for management processes that will span multiple devices which are heterogeneous.
The services to be examined include the following: (1) Services supporting the mapping process. This includes mapping from the specifications of policies to the mathematical models as well as a mapping to instructions that are used to configure resources; (2) Effective resource management includes providing services for effective software licencing which requires the tracking of software usage and identification of the user to protect software against unauthorised use. Currently, licence management has been limited in that it does not take into account changing licence practices due to customer pressure for flexible licencing which increasingly uses usage based licencing as well as perpetual software licences [11]; (3) When a service level objective is violated, the root cause of the violation should be determined. Based on our current experience with Tivoli, we anticipate that services for monitoring and control mechanisms are already provided.
One language, Ponder, has been found to be the best of the existing policy languages (a survey can be found in [17]). Ponder is excellent for specifying rules but less feasible for specifying policies such as service-level objectives or systemlevel objectives. However, policies expressing system level objectives (e.g., minimise violations of service level objectives), configuration and resource conflict policies can be rather difficult to express using rules. There are examples of specification languages that can be used to express service level objectives and the penalties associated with not satisfying these service level objectives. One example is WSAgreement which has been proposed by the global grid forum. We will study the different representations of policies. It is quite possible that each type of policy is best expressed using different specification languages. This research will examine this in more detail.
6.
DISCUSSION AND CONCLUSIONS
IBM’s views of autonomic computing and grids suggests that self-managed systems are partially characterised selfconfiguration, which refers to the management system being able to dynamically change, and self-optimising, which refers to the monitoring and tuning of resources automatically. We have begun work on these issues. In section 2 we describe the types of policies that may exist from different sources, are at different level of detail and be specified using different languages. These policies change over time. It is not feasible to assume that service level objectives are immutable or that resource conflict policies that specify constraints on resource allocation do not change over time. By studying the relationship between policies we are able to facilitate the mapping of policies to instructions for the management entities. As policies change the mapping can be done or partially redone to accommodate the changes. The goal of our work is to achieve this functionality.
If the policies are specified using some sort of specification language (or languages) then a policy template is needed to specify information to be extracted from these policies and their mapping to an optimisation model and to a configuration of management services that are used to enforce policies. A language is needed that can be used by an administrator to specify the policy template. The template is an ordered set of transformations of the higher level policy to a lower-level representation. One lower-level representation is the optimisation model. A policy template could then be used to take the results from solving the optimisation model and mapping that to instructions for management processes. This is a subject of further research that will be based on research found in [20]. The advantage is that users can specify policies using a “friendly” language and not have to specify the mathematical model.
7.
REFERENCES
[1] D. Anderson and J. Cobb and E. Korpela and M. Lebofsky and D. Werthimer. SETI@home: An Experiment in Public-Resource Computing. Communications of the ACM, Volume 45, no 11. [2] H. Abdu, H. Lutfiyya and M. Bauer. A Framework for the Efficient Management of Distributed Systems. Accepted to appear in Elsevier Journal on Computer Communications. [3] A. Dan, C. Dumitrescu and M. Ripeanu. Connecting Client Objectives with Resource Capabilities: An Essential Component for Grid Service Management Infrastructures. 2nd International Conference on Service Oriented Computing (ICSOC).
5. MANAGEMENT SYSTEM SERVICES In the previous section we defined a need to be able to map from different levels of policies to instructions for management processes. The management architecture that is the starting point is based on the IETF policy management architecture. Components include the following: (1) A pol-
[4] Y. Diao, N. Gandhi, J. Hellerstein, S. Parekh, and D. Tilbury. MIMO Control of an Apache Web Server:
4
Modeling and Controller Design. American Control Conference, 2002.
[19] P. Trimintzios, I. Andrikopoulos, G. Pavlou, and C. Cavalcanti. An Architectural Framework for Providing QoS in IP Differentiated Services Networks. Proceedings of the 7th IEEE/IFIP Symposium on Integrated Network Management (IM’01), Seattle USA, May 2001.
[5] D. Dolgov and E. Durfee. Optimal Resource Allocation and Policy Formulation in Loosely-Coupled Markov Decision Processes. Proceedings of the American Association for Artificial Intelligence, 2004.
[20] N. Muruganantha and H. Lutfiyya. Issues in Policy Specification, Distribution and Architecture for Quality of Service Management. Integrated Network Management Volume VIII, March 2003.
[6] N. Gandhi, S. Parekh, J. Hellerstein, and D. Tilbury. Feedback Control of a Lotus Notes Server: Modeling and Control Design. American Control Conference, 2001.
[21] C. Santos and A. Sahai and X. Zhu and D. Beyer and V. Machiraju and S. Singhal. Policy-Based Resource Assignment in Utility Computing Environments. DSOM 2004.
[7] The Dawning of the Autonomic Computing Era. IBM Systems Journal, Volume 42, No 1, March 2003. [8] C. Goh. A Generic Approach to Policy Description in System Management. HP Laboratories Bristol Technical Report HPL-97-82, 1997.
[22] R. Wies, Using a Classification of Management Policies for Policy Specification and Policy Transformation. Proceedings of the IFIP/IEEE International Symposium on Integrated Network Management, 1995.
[9] C. Goh . Policy Management Requirements. HP Laboratories Bristol Technical Report HPL-98-64, 1998. [10] S. Graupner and V. Machiraju, A. Sahai and A. van Moorsel. Management += Grid. DSOM 2003. [11] K. Amy and G. Stephen and S. Laurie. The Future of Software Licensing: Software Licensing Under Siege. Retrieved from: http://www.idc.com/groups/software licensing/index.htm, March 2004. [12] R. Levy and J. Nagarajao and G. Pacifici and M. Spreitzer and A. Tantawi and A. Youssef. Performance management For Cluster Based Web Services. IM 2003, March 2003. [13] I. Foster and C. Kesselman. Globus: A Metacomputing Infrastructure Toolkit. International Journal of Supercomputer Applications. 1998. [14] C. Marsan. Grid Vendors Target Corporate Applications. Network World Fusion, January 27, 2003. [15] P. Martinez, M. Brunner, J. Quittek, F. Strauss, J. Schoenwaelder, S. Mertens, and T. Klie. Using the Script MIB for Policy-Based Configuration Management. Proceedings of the IFIP/IEEE Symposium on Network Operations and Management Symposium (NOMS’02), Florence, Italy, 2002, pages 461–468, April 2002. [16] A. Prieto and M. Brunner. Sls to Diffserv Configuration Mappings. Proceedings of the 12th International Workshop on Distributed Systems: Operations and Management DSOM’2001, Nancy France, October 2001. [17] G. Stone, B. Lundy, and G. Xie. Network Policy Languages: A Survey and New Approaches. IEEE Network, 15(1):10–21, January 2001. [18] R. Thomas. IBM and Schwab Tag Team on Linux Grid Computing. Linux Electrons, January 2004.
5