A New Approach to Match Operating Systems to Application Needs C. Eckert and H.-M. Windisch Munich, University of Technology Department of Computer Science 80290 Munich (Germany) e-mail : feckertc,
[email protected]
Reprint: Proceedings of the 7th IASTED { ISMM International Conference on Parallel and Distributed Computing and Systems, October 19 -21, 1995, Washington, USA, pp 499 { 503
Abstract
Today, interconnected workstations provide sucient computing power to execute complex distributed applications. But programming distributed applications is still a cumbersome and error prone task. Hence, bridging the gap between application programmers and hardware to provide a completely transparent resource management is still a great challenge for operating system designers. We will present a topdown driven language-based approach to integrate application development and operating system design. Our approach results in a system which overcomes many de ciencies of existing systems. The system offers adaptable application programming interfaces as well as adaptive, distributed and transparent resource management services. That is, with our new approach we are able to match operating systems to application needs. Keywords: Operating Systems, Architecture, Programming Languages
1 Introduction
Today, interconnected workstations provide sucient computing power to execute complex distributed applications. But programming distributed applications is still a cumbersome and error prone task (e.g. PVM [1] or DCE [2]). The lack of transparency forces the programmer to acquire detailed knowledge about operating system concepts and services, library modules, and the physically distributed hardware con guration to implement distributed applications. Hence, it is still a great challenge for operating system designers to provide a programming environment which enables application programmers to use the computing power oered by current hardware in an easy and ecient way. The application programmer should concentrate himself on solving his application problem without being bothered about any problems concerning the distributed nature of the underlying system. That is, management of all system resources should be done completely transparent by the distributed operating system. As in the days of monolithic computing systems distributed operating system designers are still faced with the task of bridging the gap between application programmers and hardware. But only minor advances in
this area can be observed. Most of the currently existing distributed operating systems are characterized by their bottom-up design based on host operating systems such as UNIX (e.g. OSF DCE). Management of the system resources is enforced by a set of servers which provide simple and general abstractions, for instance, heavy weight processes or les to implement persistent data objects. The application programming interface (API) oered by these operating systems is static and in exible. The API can not be adapted to dynamically changing requirements of applications and its functionality can not be enhanced dynamically. The task of programming distributed applications is still very complicated, because the application programmer is faced with a heterogeneous set of concepts comprising programming language concepts as well as simple and general operating system concepts. For instance, if persistent data is needed, the programmer is forced to implement his application by using the le concept of the underlying operating system with low-level read and write operations instead of using an appropriate language construct hiding implementation details and oering facilities to program persistent objects exporting high-level methods. Moreover, the mechanisms and abstractions oered by the operating system are not well adapted to the concepts of distributed programming languages. This may lead to very inecient realizations of the language concepts. Mapping negrained activities on heavy weight Unix-processes (consider, for instance, the mapping of Emerald-objects [3]) is an obvious example of the mismatch between operating system concepts and language concepts. Proper language paradigms and concepts as well as new operating system architectures are needed to provide a transparent, distributed, and adaptive resource management. In the MoDiS (Model oriented Distributed Systems) project we have chosen a top-down and language-based approach to develop a new operating system architecture to eciently bridge the gap between application programmers and hardware and to match operating systems to application needs. The paper is organized as follows. In section 2 we will introduce the overall architecture of our approach. Section 3 presents the re ective manager architecture and policies to realize ecient object accesses. Our experiences with a rst prototype implementation are
nally discussed. Section 4 summarizes the main features of our approach and gives an outlook on our future work.
2 MoDiS-Architecture
Within the MoDiS project we are developing a new operating system architecture. The approach is characterized by the integration of operating system objects and services with application programs into one system using the same language concepts for application programs and system services.
2.1 Language-based Approach
The heart of the approach is the object-based programming language INSEL which provides a homogeneous set of language concepts. In addition to the known concepts of object-based languages INSEL offers a language construct to de ne active objects with any granularity. Active objects are called actors in INSEL. During the execution of an INSEL-program1 active and passive INSEL-objects can be dynamically created and deleted. Actors may cooperate by synchronous method invocations or by using shared passive objects. INSEL-class-objects may be nested and INSEL-objects may be simple or complex i.e. compound objects. Based on class nesting dierent dependencies between objects are implicitly established. For instance, nesting enables to restrict the scope of objects according to visibility rules known from blockoriented programming languages like Ada, i.e. object accesses are a priori restricted. This structuring feature is a very important issue in our approach. The set of INSEL-objects is structured according to different dependencies which goes far beyond the hierarchical dependency between objects based on classhierarchies known from object-orientation. The termination dependency is another example for a relationship between INSEL-objects. With respect to its existence each INSEL-object depends conceptually on exactly one other object. That is, if object depends on object with respect to its existence then deleting object causes deletion of object , too. Dierent but well de ned termination dependencies enable us to overcome the usual in exible classi cation of objects in persistent and non persistent ones. Based on structural dependencies we can associate a set of dependent passive objects with each actor called the actor context. Restricting system description to the set of actor contexts neglecting passive objects reduces the complexity of the system design. Furthermore, the well de ned actor contexts are proper units for resource management. a
b
b
a
2.2 Top-down Approach
According to the top-down approach in MoDiS the mechanisms and concepts provided by the operating system to realize INSEL-objects are systematically derived from the properties of the language concepts and are well adapted to them. To ensure great exibility the operating system should oer a broad spectrum of such mechanisms and it should be able to chose the optimal realization for a given INSEL-object. The election of appropriate mechanisms should be transparent 1
An INSEL-program in execution is called INSEL-system.
for the user, that is, the operating system must be able to gain informations for a proper decision from analyzing the structural dependencies between objects. For instance, consider a remote method invocation. The operating system should be able to chose, for instance, among (1) sending a message followed by remote execution of the method, (2) constructing a local replica of the called object, or (3) migrating the called object to the caller. The main problem of the operating system is to manage the system resources eciently and to adapt the management to the dynamically changing requirements of applications. We have solved the management problem by associating a manager object with each INSELactor, i.e. with each actor context. The task of the manager is to enforce actor-speci c resource management, e.g. the functionality of the manager is tailored to the requirements of his associated actor. The set of system resources is partitioned between the manager objects which cooperate to enforce the distributed resource management. The structural dependencies between INSEL-actors (for instance the termination dependency) are passed to the manager objects. This results in a well structured set of ne-grained managers (servers) which perform operating system services. The top-down approach leads to a systematically structured distributed operating system. A manager object is dynamically created each time an actor is created. Hence, our resource management dynamically adapts to changing resource requirements of applications.
2.3 Overall Architecture
The overall architecture of our system is sketched in gure 1 which shows a snapshot of an INSEL-system. The properties of the hardware consisting of interconnected workstations are hidden by a set of basic abstractions of a per node micro kernel. The broad spectrum of basic INSEL abstractions to realize INSELobjects is implemented on the next layer. For instance, to implement INSEL-actors user-level-threads (e.g. scheduler activations [4]) with dierent complexity should be implemented. Basic memory management abstractions comprise address space management, distributed shared memory with sequential and release consistency, as well as, volatile and persistent segments. In addition, a spectrum of manager objects oering dierent management policies and different functionalities is needed (e.g. transaction managers, access control managers, or managers which provide persistent data stores). Based on the INSEL abstraction layer the resource management layer is implemented. A key issue in the MoDiS approach is the exploitation of the structural dependencies between INSEL-objects which results in management policies adapting to application needs. The manager objects comprise a re ective distributed manager architecture. For instance, the termination dependencies mentioned above enable a memory management without performing expensive, global garbage collection. Reducing the problem of false sharing is another example. Based on informations about use dependencies between objects the manager objects are able to allocate objects in disjoint pages reducing the amount of access con icts. Information about use and cooperation dependencies in addition to information about current processor load
can be exploited with respect to adaptive load balancing, too. Due to the language-based integrated approach of MoDiS INSEL-applications as well as operating system services are integrated into one INSEL-system. That is, operating system services such as authentication or management of persistent data are implemented in INSEL. INSEL-applications are dynamicallybound to the existing system. During their execution they may use Application2
Application 1
Application 3
API
s4 s7 s5 s6 s3
s2
INSEL-Program
s1
realized INSEL-system
s4
s5
Memory-Management Actor Management Actor Communication
s7
s3 s6
s2 s1
Reflective Architecture Basic Memory Management Basic Actor Management Basic INSEL-Communication µ-Kernel Node 1
µ-Kernel
µ-Kernel Node 2
Actor context
INSEL-Abstractions
...
Node n
Kernel Threads IPC Input/Output
depends on
Manager
Figure 1: MoDiS Architecture system objects or objects of other applications. Integrating applications into the existing system dynamically extends the system interface. Application speci c interfaces can be provided. For instance, in gure 1 three dierent interfaces given with objects 4 5 and 7, respectively, are shown. Interface 4 may, for example, oer a secure execution environment for application 1 providing special security services, and interface 5 may, in addition, support a reliable execution of s ;s
s
s
s
application 2 by providing a transaction facility with check-pointing and recovery. Hence, the holistic top-down approach of MoDiS leads to an operating system adapted to application needs.
3 Re ective Manager Architecture
As mentioned above, creating an INSEL-actor implies the creation of its associated manager. The manager enforces actor-speci c resource management tasks. Due to the distributed nature of INSEL-systems the corresponding resource management is distributed, too. In the following we will focus on object management. Managers cooperate with each others to enforce memory management. Each manager manages the passive objects belonging to the actor context of its associated actor. As an actor context may be physically distributed a manager proxy is created on each remote node where one of its context objects resides to avoid remote management of passive objects. The manager proxy is responsible for memory management, handling remote access requests as well as termination of all of its context objects that reside on the proxy's node. Thus, remote manager cooperation is minimized as most of the operations can be performed by the local manager proxy. According to INSEL's termination dependencies, the termination of an actor implies the termination of its associated (possibly distributed) actor context. Utilizing the manager proxy concept, this can be performed most eciently by simply (asynchronously) requesting the termination of the manager proxies. When all manager proxies have terminated, which implies the termination of all assigned passive objects, the manager terminates, too.
3.1 Implementation of Object Invocations
To implement object invocations the managers have to choose among a spectrum of possibilities such as local invocation via procedure calls, remote invocation via RPC, and some form of migrating together caller and called object. In our approach, this decision is guided by static (i. e. invariant) and dynamic (i. e. changing over time) object properties, rather than by annotations or down-calls made by the programmer. According to a classi cation of relevant object properties an assignment of objects or object classes to access mechanisms is statically performed as far as possible. In cases, where ecient object access is only feasible with knowledge of object access patterns, adaptive heuristics based on recent access behavior are employed. The heuristics are executed by the managers and are based on runtime information gathered in the INSEL-abstraction layer and the management layer, respectively. 3.1.1
Strategies
Adaptive implementation of object invocations are based on two basic strategies: (1) the migration strategy which transparently migrates objects when the amount of local access is very low, and (2) the replication strategy which decides whether an object should be used via RPC or be replicated and called locally. Migration strategy: The goal of the migration strategy is to minimize the mean access time for an
object by monitoring the overall access behavior for an object over a period of time and by migrating the object if there exists a node that is expected to reduce the overall object invocation costs in the future. The algorithm is executed after executing an object's method. Basically, it works in two phases: phase one tries to detect whether there is a potential for migration and if so, phase two determines a candidate node for migration (if any). The rationale here is to keep memory consumption low during execution of phase one and to do a closer examination during phase two. As the algorithm uses a heuristic, the reduction of mean access time is expected to be suboptimal. However, the simulations we ran indicate that the migration strategy yields the expected performance gains in favorable cases where some processors use an object signi cantly more than others. In cases where the access pattern doesn't permit performance improvement by migration or migration even decreases performance due to little locality of access the algorithm self-adapts its parameters to prevent from bad behavior. Replication strategy: Replicating an object (partially or as a whole) can only enhance overall system performance, if an object's data are not updated too frequently and if invoking object methods can be done without expensive global synchronization. The basic idea of our replication strategy is to start realization of object's invocations by using an object via RPC and to monitor the read/write-ratio. Then, if this ratio exceeds a certain limit, the object is replicated and further accessed locally. Vice versa, if an object is replicated the costs of synchronization and preserving consistency are estimated. If a node encounters that these costs exceed a certain limit, it switches back to remote invocation. Thus, the strategy is able to adapt to varying access patterns and to choose the invocation mechanism that is likely to minimize object access time. A heuristic to estimate the costs of synchronization and preserving consistency in the case when an object is replicated is based on observing object accesses at method-execution level rather than observing each single access. This is feasible as a weak memory model, namely lazy release consistency [5] is employed. By tying together synchronization and consistency operations for an object the overhead for replica usage can be estimated by counting the number and size of local messages sent to acquire locks and recent data and to propagate updates. This enables us to specify upper bounds for the cost of replica usage. 3.1.2
Ob ject Properties
Relevant object properties are object access coordination, object access pattern and object size. Object access coordination is categorized into none (N), monitor (M), reader-writer (RW) and complex (C), respectively2 . If object access is not synchronized (type none), we take into account how the object's methods access local object data. That is, we further distinguish between read-write-objects, read-only-objects, and write-once-objects. An object which has methods 2 With the complex type of coordination we capture all types of access coordination which can not be described either by monitor or reader-writer semantics.
Object properties N M C RW Strategy rw r-only w-once migration x x x replication x x x Table 1: Object properties and employed strategies to read and write its data is called read-write-object. An object which only has methods to read its data is called read-only-object, whereas an object which is only written at creation time (via parameter passing) is called write-once-object. For read-write-objects we use the migration strategy, whereas objects of the latter types are replicated to permit local accesses. For the monitor and complex type of coordination we choose the migration strategy because the costs for global synchronization and preserving consistency of replicas often outweigh the performance gains achieved by accessing local object replicas. As opposed to the former, objects of the reader-writer type are subject to the replication strategy, because there is clearly a certain ratio of read to write operations above which performance gains through replication are possible. The correlation between object properties and the employed strategies is summarized in table 1. Object access patterns are observed as sequences comprising read and write operations. For our strategies it is sucient to compute the fraction of read operations and the total number of operations. Object size is important for two reasons. First, if an object's size is smaller than a certain limit, memory overhead for object replication may be too high, so that the migration strategy is employed. Second, the way object migration is realized depends on the size (and the memory layout) of the object. Small objects with xed size that can contiguously be represented in memory are migrated by simply moving their representation to the destination node. Objects with variable size such as dynamic data structures are migrated using on demand paging to prevent from copying all internal objects to the destination node.
3.2 Prototype
The prototype implementation has been designed in order to have a test-bed both for the evaluation of basic memory management mechanisms and for the development of new higher level memory management strategies. As a basic mechanism for code and data sharing we employed a single address space (cf [6]) by using a distributed shared memory (DSM) protocol (cf [7]) with page based sequential consistency. On top of the global address space we implemented a memory manager concept which allows for concurrent ne-grained virtual memory allocation and deallocation. To circumvent inter-node cooperation to satisfy memory allocation requests we equally partitioned the global space among the nodes of the system. A special per-node memory manager, called the root memory manager, is responsible for assigning portions of a node's partition to local memory managers. When an actor and the corresponding actor context is created, a memory manager is created which handles the actor's memory requests. That is, the memory manager is responsible
for providing memory for all objects which belong to the actor context. If static analysis of the actor's allocation behavior indicates that only a xed amount of memory is needed by the actor then no memory manager is assigned and the actor requests memory from the root memory manager. If a memory manager runs out of memory it requests more memory from the local root memory manager. For eciency reasons requests between memory managers are issued in terms of pages i. e. at rather coarse granularity. There are two major advantages of our concept over conventional concurrent memory allocators. First, data needed for representing actors and their associated passive objects is stored by the corresponding memory manager in a page-structured space that is distinct from other actor's spaces. Consequently, migrating an actor to another node (for load balancing purposes) or swapping out a blocked actor's data can be done without causing false sharing. Second, the fact that allocation and deallocation can be done using disjoint portions of local virtual memory improves the potential for concurrent execution of these operations. Though concurrent allocators with high throughput have been proposed (cf [8]), our memory allocator further enhances performance due to the elimination of locking overhead along with the associated system call overhead for blocking and unblocking actors. Using actor contexts as units for memory management turns out to be suitable for structuring the global address space. Current work focuses on the integration of basic support for object migration and object replication into the INSEL-abstraction layer and on the implementation of the adaptive management strategies described in section 3.1.1. First performance measurements of the implementation of our replication strategy showed, that our strategy scales well and is able to dynamically adapt to varying object access patterns (cf [9]). In cases where the object was mainly read (fraction of write operations 20%) the expected breakdown of access time could be observed due to replica usage. When the access pattern changed, i. e. more writeoperations where encountered, our strategy switched back to remote invocation and hence, chose the proper invocation mechanism. Another important aspect is, that the algorithm detects access patterns which would cause a continuous switching between replica usage and remote invocation. The algorithm then self-adapts its parameters until a more favorable access pattern is observed.
operating system are executed as one single INSELsystem. Consequently, the dynamic binding of new INSEL-classes facilitates both the introduction of new system interfaces and the enhancement of operating system functionality. For an ecient implementation of the INSEL language concepts we designed a distributed re ective manager architecture. Each manager is in charge of choosing among a set of basic INSEL-abstractions to realize its associated INSELobjects. The decision is guided by knowledge of static and dynamic properties of the objects. That is, resource usage and management is tailored to the speci c requirements of INSEL-objects. Experience with our prototype implementation based on Mach3.0 provided rst results especially on memory management issues. Further work focuses on the integration of language concepts for reliability and security, on the elaboration of our manager concept for all classes of resources, and on the implementation of the whole approach on top of a tailored micro kernel.
References [1
[2
[3
[4