A Service Management Scheme for Grid Systems

A Service Management Scheme for Grid Systems Wei Li, Zhiwei Xu, Li Cha, Haiyan Yu, Jie Qiu, Yanzhe Zhang Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100080, China {liwei, zxu, char, yuhaiyan, zhangyanzhe}@ict.ac.cn, [email protected]

Abstract. In this paper, we propose a service management scheme named Grid Service Space (GSS) model, which provides application developers a high-level logical view of grid services and a set of primitives to control the full lifecycle of a grid service. The GSS model provides a novel approach to meet the desired management requirements for large-scale service-oriented grids, including location-independent service naming, transparent service access, fault tolerance, and controllable service lifecycle.

1. Introduction The physical resource independent property, such as location transparency and access transparency, is a general design principle for resource management in distributed systems. With the emergence of grid computing, the distributed resources are abstracted as Grid Services [3], which aims at the hidden of the heterogeneity of various resources and focuses on the standardization of interface descriptions, access semantics and information representations. Strictly speaking, the current definition of grid service does not endow distributed resources with fully virtual properties due to the use of location-dependent naming mechanism (e.g. The OGSA [3] framework leverages a URL-based naming scheme to indicate a service instance’s physical location) and the lack of transparent service access mechanisms. Under such circumstances, developers have to take extra efforts on much general-purpose resource management work, such as service discovery, scheduling, error recovery, etc. Another problem is that a developer has to modify his applications when the URL-based name of a service changes. How to achieve complete physical resource independency remains a challenge for grid resource management. From the knowledge of traditional operating system design, we know that the virtualization technologies, such as virtual memory [1] and virtual file system [6], are common ways to obtain physical resource independent properties. The virtual memory technology can fulfill the requirements of dynamic storage allocation, i.e. desires for program modularity, machine independence, dynamic data structures, elimination of manual overlays, etc. The virtual file system technology enables the accommodation of multiple file system implementations within an individual operating system kernel, which may encompass local, remote, or even non-UNIX file systems.

To obtain the full physical resource independent properties, we adopt a service management scheme called Grid Service Space (GSS) model, which is similar to the experiences of virtual memory and virtual file system. With this model, a programmer can refer to a service by a location-transparent name without knowing the service location, status, capabilities, etc. Hence, the runtime system can obtain several benefits such as load balancing (by choosing lightly loaded services), fault tolerance (by switching to a new service in response to service failure), locality of service access (by locating a nearer service), etc. The paper is organized as follows. Section 2 analyzes the requirements for grid service management. Section 3 presents the detail description of the GSS model. Section 4 introduces the implementation and section 5 concludes this paper.

2. Requirements for Service Management In current grid research, main efforts have been put on standardizing physical resources as grid services. Analogous to the traditional operating system harnessing the use of hardware, a grid operating system (GOS) becomes a natural solution to manage the use of grid resources. More precisely speaking, a GOS is a runtime system that can manage the heterogeneous, distributed, and dynamical resources efficiently. To realize such a GOS, it is necessary to analyze the lifecycle of a grid application carefully, which can be divided into programming phase and runtime phase. At the Programming phase, a programmer needs to integrate various services together to solve a problem. In most cases, programmers do not care about the location of services (i.e. where the task to be executed). From the view of programmers, the services should be physical resource independent, and a programmer can refer to a service just by a unique name and desired attribute descriptions. At the Runtime phase, when a program is running, it often encounters problems such as resource scheduling, error recovery, task migration, etc. A GOS should provide transparent service access mechanisms including service discovery, error recovery, lifecycle control, etc. to reduce burden of attacking the above issues. From the above analysis, we can summarize the main requirements of a service management system as physical resource independent naming, transparent service discovery and scheduling, service lifecycle control and fault-tolerance. In addition, the GOS should also consider the implementation issues such as resource topology, programming language support, performance, reliabilities, etc.

3. The GSS Model The GSS model is proposed to abstract and define the key concepts for a service management system. In this model, the basic elements are virtual services and physical services, which also construct the Virtual Service Space (VSS) and Physical Service Space (PSS). The difference is that a virtual service is a logical representation for a computational resource while a physical service is a computational entity with net-

work access interface. A functional equivalence of multiple services is indicated by a coessential identifier, which means that these services have same processing functions (though they may have different capabilities and attributes). The mapping between two coessential services is called coessential mapping. The coessential mapping from a virtual service to a physical service is called scheduling mapping, and the coessential mapping from a virtual service to a virtual service is called translating mapping. For a given virtual service with a certain coessential identifier, all physical services, which have same coessential identifiers, are called discoverable set of this virtual service. All physical services in a discoverable set are candidates for this virtual service to bind to. In addition to the above basic definitions, we introduce the Virtual Service Block (VSB), which is a subset of VSS and groups related services together within a VSS. we also provide a set of primitives for service lifecycle control. Several service states are defined to indicate different phases of a service lifecycle. A service can switch to different states via the service lifecycle control primitives. 3.1. Formal Definitions Definition 1. A Service Space is a set denoted by S = {s1, s2, …, sh}, where sh is the name of a service. Definition 2. For all services in a service space S, they can be divided into two types, which are denoted by a set L = {vs, ps}. vs represents a virtual service, whose name is location independent, and ps represents a physical service, which has a locatable address. We denote the type of a service si as A (si), where i ≤ h. A service space is called a Virtual Service Space of S and denoted by V, if V⊆S and for each service si∈V, there is A (si) = vs. A service space is called a Physical Service Space of S and denoted by P, if P⊆S and for each service si∈P, there is A (si) = ps. Definition 3. If s1, s2, …, si∈S have a same function, we say s1, s2, …, si are coessential services. The function is expressed by a coessential identifier e. All coessential identifiers construct a set ES = {e1, e2, …, em}, m ≤ h, which is called coessential identifier set for service space S. For each si∈S, it has one and only one coessential identifier. That is, for each si∈S, there is a mapping χ: S → ES. If two service si and sj are coessential services, there is an equation χ (si)= em = χ (sj). Definition 4. The set C(Sx, em) = {sj | sj∈Sx and χ (sj) = em and Sx ⊆ S} is called the coessential service set of em for service space Sx. For every service space Sx⊆S, there is C(Sx, e0)∪C(Sx, e1)∪…∪C(Sx, en) = Sx, and for random two coessential service sets C(Sx, ex) and C(Sx, ey), ex≠ey, there is C(Sx, ex)∩C(Sx, ey) = ∅. Definition 5. The mapping ω: C(Sx, em) → C(Sy, em) is called the coessential mapping of em from Sx to Sy, where C(Sx, em) = { sj | sj∈Sx and χ (sj)= em and Sx ⊆S } and C(Sy, em) = { sj | sj∈Sy and χ (sj)= em and Sy ⊆S }. Especially, the coessential mapping of em from a VSS V to a PSS P is called scheduling mapping. For each virtual service s∈V,

P is called the discoverable set of s. In addition, the coessential mapping of em from one VSS V to another VSS V’ is called translating mapping. For each virtual service s∈V, V’ is called the translatable set of s. 3.2. Semantics of GSS Management Service naming mechanisms. The definitions of virtual service and physical service do not give the semantic of a service name explicitly. For a virtual service, the only prerequisite to its name is to differentiate it from other virtual services in one VSS. The location-independent means this name contains no physical resource information and should be translated to a locatable resource before a program can access this virtual service. In our model, the virtual service can use a code-based name or stringbased name, which can be user-friendly or even semantic-based. With the location independent naming mechanism of VSS, programmers can develop applications at a virtual resource layer. For a physical service, the GSS model does not restrict its naming mechanism only if the service name can indicate a locatable address. For example, we can use an IP address and a TCP port to indicate a service instead. The URL-based naming mechanism in the OGSA framework guarantees the global uniqueness of grid services and gives each resource a locatable address. Virtual Service Block. Normally, application developers require the ability to organize a group of related services together. In addition, a programmer needs the ability to refer to this group of services by a name. Similar to the virtual memory design, a service management scheme should fulfill the objectives such as program modularity, module protection and code sharing. The Virtual Service Block (VSB) can achieve the above objectives. In the GSS model, a VSS is composed by a set of named VSB. Each block is an array of service names. The service name in a blocked VSS is comprised by two components (b, s), where b is the name of a VSB and s is a unique name within b. The first objective can be achieved by allocating each module to one VSB. Therefore, other programs can easily share this module just by changing the block name, and the name within this block can remain unchanged. The second objective can be gained by adding extra information and checking mechanism. Each VSB can set the information such as block owner, access rights, etc., to implement the space protection. By mapping one VSB into multiple VSS and using different block names, multiple programs can share the code of modules in other programs. Virtual-Physical Service Space Mapping. Different from the memory mapping technology, the virtual-physical service space mapping in the GSS model is more complex. Although a VSS is similar to a virtual memory space, the PSS is much different from the memory space due to the feature of autonomous control and huge size. These two limitations brought several difficulties for efficient service space mapping. The first one makes it hard to deploy a physical service to a specific address (except

for service owners). The second one may cause the performance of service locating even worse because of the huge search space of PSS. In the GSS model, we use coessential mapping mechanism formally described in Definition 5, parallel pre-mapping technology and discoverable set to address the above problems. When mapping a virtual service to a physical service, we should consider two important issues: correctness and performance. The correctness can be guaranteed by following the definitions of GSS model. The performance can be improved by better organization of physical services, efficient service locating, and scheduling policies. Several research work [2] [4] [5] have concentrated their efforts on the above issues. To improve the performance of service space mapping, we exploits the parallel pre-mapping technology together with VSB to improve the overall service space mapping performance by hiding the service locating time. The idea is to keep locating and mapping multiple physical services for a group of given virtual services (such as all virtual services in a VSB) in parallel before a running program refers to these virtual services actually. In addition, for each coessential identifier, we use the discoverable set defined in Definition 5 to build a small search space for service mapping, which also can reduce the searching time.

Fig. 1. Using Parallel Pre-mapping technology together with VSB to hide the searching time.

Fig. 2. Using discoverable set to reduce search space.

Figure 1 illustrates the parallel pre-mapping technology used in service space mapping. When loading a program, the GOS will map several virtual spaces in parallel at first. When a program starts to run, it can directly access the mapped physical services. At the same time, the GOS will continually map virtual services of subsequent VSB in parallel.

Figure 2 illustrates using of a discoverable set to reduce the search space for a virtual service. The GOS will build a discoverable set for each coessential identifier before loading programs. When a program is loaded, the search operations can be performed within a relative small discoverable set. The parallel pre-mapping and discoverable set technologies can be utilized together to improve the overall performance of service space mapping. 3.3. Service Lifecycle Control Compared to physical memory access, the lifecycle of service access is more complex. When a user accesses a service, there may have single send/receive operation or multiple send/receive operations. While in virtual memory systems, accessing a memory cell is in fixed time and the access mode is determined. To perform correctly and determinedly, lifecycle control of services is needed. In our GSS model, the different capabilities and properties between virtual services and physical services imply that they have different lifecycle patterns. Different control primitives are needed to manage the status transition of virtual services and physical services respectively. Here we mainly introduce the lifecycle control of virtual services. When a programmer refers to a virtual service, he not only want know a location independent name but also the full process of service access. In this section, we provide a set of primitives to control the activities of a virtual service. In order to describe the lifecycle control of a virtual service properly, we use a more concrete entity called mService to represent a virtual service. The mService can be defined as a tuple m = (n, e, i, ω, V, p, st), where n is a unique service name in a VSS, e is the coessential identifier, i is the session identifier, ω is the coessential mapping, V is a VSS, p is a physical service name and st is a service state indicator, which is an element of a set ST = {Created, Binded, Running, Waiting, Terminated}. The lifecycle of a virtual service includes several relevant operations, such as the virtual service creation, service discovery and scheduling, session control, etc. The lifecycle control primitives for virtual services are summarized as follows: – create (n), performed when we create a new virtual service and start up a new session with it. After this operation, the state of mService st = Created and a session identifier i is returned. – open (n), performed when we reopen an existing virtual service that is out of session. After this operation, the state of the virtual service remains unchanged and a session identifier i is returned. – delete (n), performed when we remove a virtual service from a VSS. After this operation, the virtual service with name n is deleted from this VSS. – bind (i, e, ω), performed when we map a physical service to a virtual service. After this operation, there is ω (n) = p and st = Binded. In addition, the virtual service n is added to VSS V, the coessential identifier e is added to EV. – invoke (i), performed when we call a method of a virtual service. After this operation, st = Running. – sleep (i), performed by program or GOS kernel. After the operation, st = Waiting.

– interrupt (n, i), performed when an external event occurs. After this operation, st = Running. – close (i), performed when we cut off the current session with a virtual service. After this operation, the virtual service is out of session and the users cannot interact with this virtual service until using open primitive to create a new session.

4. Implementations The GSS model is a key feature of the Vega GOS in Vega Grid project [8] [9], which aims at learning fundamental properties of grid computing, and developing key techniques that are essential for building grid systems and applications. The Vega GOS is also used to build a testbed called China National Grid (CNGrid), which is sponsored by the 863 program and aims at integrating high performance computers of China together to provide a virtual super computing environment.

Fig. 3. The layered architecture of Vega Grid.

The architecture of Vega Grid is conformed to the OGSA framework and Figure 3 shows the layered architecture of Vega Grid. At the resource layer, resources are encapsulated as grid services or web services. The GOS layer will aggregate these services together and provides a virtual view for developers, who can use the APIs, utilities, and developing environments provided by Vega GOS to build VSS-based applications. At this layer, the most import work is deploying and publishing a physical service to upper layers. In our implementation, each physical service should have a unique coessential identifier. After generating the coessential identifier for a physical service, we register this service to a resource router with the coessential identifier and other information needed. According the algorithm in [7], this physical service will be published to all resource routers and every GOS can know the existence of this physical service. At the GOS layer, the resource router plays an important role to locating resources. The current implementation of Vega GOS is developed as grid services specified in [3]. In addition, Vega GOS implements the virtual service lifecycle management defined in Section 3.3. As a full functional integrated system, the Vega GOS also considers the implementation issues such as security, user management, communication, etc., which are not covered in this paper.

At the application layer, programmers can use the GOS APIs to build a custom application. We also provide a GUI tool called GOS Client based on GOS APIs to help users to utilize the services in Vega Grid.

5. Conclusions and Future Work We have discussed the issues on grid service management. In order to overcome the obstacles in grid application development and system management, the GSS model is proposed to provide the location independent naming, transparent service access and service lifecycle control abilities to developers. As a fundamental component of our service management scheme, the GSS model also helps other research work such as grid-based programming model. We are currently implementing the Vega GOS and the GSS model on the CNGrid testbed. We hope the practical running of Vega GOS and its applications can verify the basic concepts and technologies in the GSS model.

References [1] P. J. Denning, “Virtual Memory”, ACM Computing Surveys, vol. 2:3, pp. 153-189, 1970. [2] S. Fitzgerald et al., “A Directory Service for Configuring High-Performance Distributed Computations”, Proc. 6th IEEE Symposium on High Performance Distributed Computing, pp. 365-375, 1997. [3] I. Foster et al., “Grid Services for Distributed Systems Integration”, Computer, pp. 37-46, 2002. [4] A. Grimshaw et al., “Wide-Area Computing: Resource Sharing on a Large Scale”, Computer, pp. 29-37, 1999. [5] A. Iamnitchi et al., “On Fully Decentralized Resource Discovery in Grid Environments”, International Workshop on Grid Computing, 2001. [6] S. R. Kleiman, “Vnodes: An architecture for multiple file system types in Sun UNIX”, In USENIX Association Summer Conference Proceedings, pp. 238-247, 1986. [7] W. Li et al., “Grid Resource Discovery Based on a Routing-Transferring Model”, 3rd International Workshop on Grid Computing (Grid 2002), LNCS 2536, pp. 145-156, 2002. [8] Z. Xu et al, “Mathematics Education over Internet Based on Vega Grid Technology”, Journal of Distance Education Technologies, vol. 1:3, pp. 1-13, 2003. [9] Z. Xu et al, “A Model of Grid Address Space with Applications”, Journal of Computer Research and Development, 2003.