Implementation of Scheduling Policies in Real-Time Mach Tatsuo Nakajima Hideyuki Tokuda School of Computer Science Carnegie Mellon University Pittsburgh, PA, 15213 1
Introduction
Future advanced applications will require new types of objects such as continuous media objects and distributed objects[3, 1]. Such requirements will stress operating systems ability to support various levels of timeliness, reliability and performance. A single computational model and resource management policy may not cover conflicting users requirements. Therefore, an extensible operating system structure is important in future distributed systems. Traditional operating systems such as Unix support fixed programming models and resource management policies. In the future, the requirements of applications will become dramatically more complex in order to support advanced computing environments such as very large scaled distributed computing, highly parallel computing, reliable computing or real-time computing. Operating systems which satisfy the requirements should support multiple model and various resource management policies. For example, parallel computing may require several programming models for different parts of an application: some part may need a message passing model and another part may need to use a shared memory model. In addition, generic high level abstractions may require different implementations according to the characteristics of an application. For instance, object invocation can use either function shipping such as RPC and data shipping such as distributed shared memory(DSM). Resource management modules are especially critical for supporting real-time computing and high performance computing. Supporting multiple policies is important to ensure
[email protected] 1
This research was supported in part by the U.S. Naval Ocean Systems Center under contract number N66001-87-C-0155, by the Office of Naval Research under contract number N00014-84-K-0734, by the Defense Advanced Research Projects Agency, ARPA Order No. 7330 under contract number MDA72-90-C-0035, by the Federal Systems Division of IBM Corporation under University Agreement YA-278067, and by the SONY Corporation. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing official policies, either expressed or implied, of NOSC, ONR, DARPA, IBM, SONY, or the U.S. Government.
timing constraints and to effectively reflect the semantic information of applications. The basic functions of resource management modules are allocation and deallocation of resources such as processors, communication links and memory spaces. There are two reasons to support multiple policies. The first reason is that sophisticated policies are usually very expensive. Simpler polices may be suitable for the applications which gain no advantage from using the sophisticated policies. The best policy will depend on the requirements of the application. The second reason is that a resource management module can use the semantic information of applications. For example, physical memory management modules for memory mapped files can use semantic information about the characteristics of files to determine the global working set. We can create a new policy by composing several policies which control the different types of files. In this paper, we describe an object-oriented framework for structuring a resource management module in operating systems. This framework provides policy/mechanism separation and the composition of simple policies for the resource management module. We also demonstrate this framework by presenting the implementation of a scheduler in Real-Time Mach[7, 8].
2
Object-Oriented Framework for Resource Management
Encapsulating resource management into an object makes the structure of operating systems clear and makes it possible for multiple policies to schedule resources. Each resource management module should support a distinct resource and should be defined as an object which has a well-defined interface.
2.1 Policy/Mechanism Separation Operating systems should control several resources such as CPU, memories and networks.Traditional operating systems
Resource Management Interface
Resource Manager Object
Policy Interface Policy Object
A mechanism object is called and shared by a policy object. A thread resource management object uses ready queue objects to maintain runnable threads and its policy object for a method invocation may have RPC objects and DSM objects as underlying communication mechanism objects.
2.3 Composition of Policy Objects Mechanism Objects
Figure 1: Resource Management Object use fixed policies for resource management, but in the near future, operating systems should be able to add new policies to ensure the requirements of applications. Policy/mechanism separation was proposed in Hydra[2] to support multiple scheduling policies. The policy part of resource management is separated from the mechanism part. The policy part provides a well-defined interface and is clearly separated from other parts of resource management functions. All policies in the same resource management have the same interface in order to replace policy parts dynamically. In our framework, a policy part and a mechanism part are represented as objects. The advantage of this approach is that we can cleanly separate and encapsulate policy and mechanism, and achieve the reuse of existing policy objects by inheritance.
2.2 Policy and Mechanism Objects In our framework, a resource management object consists of three kinds of objects: resource manager object, policy object and mechanism object shown in Figure 1. A resource manager object provides the abstract view of resources to other part of operating systems and users. Abstract resources share physical resources and the policy in a resource management object decides the actual allocation sequence of physical resources. For example, a thread resource management object controls binding between a thread and a processor. A memory resource management object controls allocations of physical pages for a virtual memory object. A policy object encapsulates various policies and provides various methods to decide the allocation order of resources. For example, a policy object of a thread resource management object includes processor scheduling algorithms and has methods to decide the preemption and selection of threads. In a transaction resource management object, a policy object has various concurrency control algorithms to decide the order of transactions.
A complex policy object can be created by composing several simple policy objects. From a resource manager object, the composition of policy objects can be regarded as one object. All policy objects have the same syntactic interface so that we can replace, insert and remove any policy objects in the composition. Composition of policy objects is achieved by using multiple inheritance in our framework. Inheritance hides the existence of superclasses from the clients of a class. If a policy object cannot decide the allocation of a resource by itself, the object forwards a message to its superclass. Each policy is defined as a different class. The policy object is the instance of a class which is a subclass of several classes defining small policies. The class defining common policies may be used by many classes as superclasses. Each policy object should include a simple policy. The complex policies are created by inheriting several small policy objects as its superclasses. In the framework, controlling multiple inheritance is important for flexible composition. The inheritance mechanisms should be customized according to the characteristics of the policy composition.
3
Implementation of Scheduler in Real-Time Mach
3.1 Overview of Real-Time Mach Real-Time Mach has been developed in CMU for providing a common distributed real-time computing environment. Real-Time Mach is an extension of Mach kernel and has the following four characteristics over the original Mach kernel.
Real-Time Thread Model. Real-Time Scheduling. Real-Time Synchronization. Real-Time IPC. The major feature of Real-Time Mach is predictable resource management which enables us to analyze computing before executing applications. The resources for time critical threads are evaluated eagerly. Every object provided by the
kernel such as a thread, a memory and a port has attributes which reflect the requirements of applications. In Real-Time Mach, a thread is defined for a real-time or non-real-time activity. For a real-time thread, additional timing attributes must be defined by a timing attribute descriptor. A real-time thread is classified as a periodic or aperiodic thread and each class of threads is defined as a soft or hard real-time. The real-time scheduler in Real-Time Mach allows a system designer to predict whether the given task set can meet its deadlines or not. For soft real-time activities, the designer may predict whether the worst case response times meet the timing requirements or not. We adopted a capacity preservation scheme to cope with both hard and soft real-time activities. By capacity preservation we mean that we divide the necessary processor cycles between the two types. Under a transient overload condition, the scheduler uses tasks’ importance value to decide which task should complete its computation and which should be aborted or canceled. Traditional synchronization primitives use FIFO ordering among waiting threads to enter a critical section, since FIFO ordering can avoid starvation. In a real-time computing environment, however, FIFO ordering often creates a priority inversion problem[4]. A higher priority thread may have to wait for the completion of all low priority threads in the waiting queue. Real-time operating systems should support various synchronization policies based on queueing order among waiting threads and preemptability of running threads in the critical section. IPC is heavily used in microkernel environments. Predictable IPC is important to build modular and manageable systems. Our IPC can control the receiver of a message and the priority between clients. When creating a new port, we may specify a port attribute to select IPC policies.
3.2 Structure of Real-Time Mach Thread Management The thread management object in Real-Time Mach provides an interface for controlling threads such as thread create, thread terminate, thread suspend and thread resume. We show the structure of the thread resource management object in Figure 2. Real-Time Mach encapsulates policy and mechanism parts in objects. The thread resource manager exports a thread management interface and calls a scheduling policy object. The scheduling policy object encapsulates processor scheduling policies to determine the preemption and the selection of threads. A priority queue object was used as a mechanism object in the thread management object. The priority queue object holds a ready queue for keeping track runnable threads. The queuing strategy is determined by a policy object so that the queue object may upcall to the policy object to determine the
Thread Interface
Thread Management Object
ITDS Interface Policy Object ITDS Object
RR
RM
RMPOLL
FP
RMDS
Mechanism Object Priority Queue Object
Figure 2: Real-Time Mach Scheduler order in a queue. In the rest of the section, we focus on the composition of policy objects.
3.3 Scheduling Policy Objects Scheduling policies including background servers or aperiodic servers[5] are becoming very complex. It is a very difficult task to implement and add a new scheduling policy to a system. A policy may be very similar to other policies so that they may share a portion of their codes. For example, background threads in the rate monotonic scheduling policy can be scheduled by the fixed priority, FIFO, or round-robin manner. If we create three different objects for each scheduling policy, most parts of the policies will be redundant. If a policy designer wants to create a new scheduling policy, he/she must create it from scratch. To solve the problem, we propose the notion of a micro scheduling policy object. A micro scheduling policy object contains only simple scheduling policy such as a fixed priority, rate-monotonic, or deferrable server. A micro scheduling object is similar to mixin[6]. It includes an incomplete object which is meaningless by itself. We use multiple inheritance for the composition of policies. A designer can create a new scheduling policy through the composition of the micro scheduling policy objects by specifying its super classes. Consider the case where a designer wants to create a scheduling policy C which is a composite of real-time scheduling policy A and non real-time scheduling policy B. We can create class C as a new scheduling policy by inheriting from the classes of policy A and policy B. We decompose a scheduling policy into a base scheduling policy, an aperiodic server policy, and a background scheduling policy. The classes of scheduling policy objects have three links which are named explicitly to control multiple inheritance. An aps link indicates an aperiodic server policy, a back
link indicates a background policy, and a super link represents an inherited scheduling policy by the current scheduling policy. A base scheduling policy controls the scheduling of all threads and it may call an aperiodic server policy object, a background policy object, and other policy objects through an aps link, a base link, or a super link. The inheritance in our framework specifies the name of links for super classes explicitly. It is an example of showing that unique inheritance mechanism is not appropriate for all compositions of objects. The problem of our approach is the cost of the composition. The composition of micro scheduling policy objects requires several method calls to execute methods in the policy interface. The scheduler is a critical component which affects system performance. In our implementation, we use a customized run-time for realizing multiple inheritance and method caches to make method calls faster. The scheduler need not change the configuration of the composition of policies unless current scheduling policy is changed. The method cache is invalidated only at that time. The strategy reduces the overhead and implementation cost since the strategy is not provided by traditional object-oriented languages. In the rate monotonic policy, we need only one procedure call for periodic threads, and two procedure calls for aperiodic threads 1 .
3.4 Micro Scheduling Policy Objects The current version of RT-Mach supports 9 micro scheduling policy objects: round-robin (RR), fifo (FIFO), fixedpriority (FP), early deadline first (EDF), rate-monotonic (RM), deadline monotonic (DM), polling server (POLL), deferrable server (DS), and sporadic server (SS)[5], where POLL, DS and SS are aperiodic server policies. RM and DM can use DS or SS for sporadic threads and FP, FIFO or RR for background threads. From these objects, we support 28 scheduling policy objects in total. Now, we explain how to compose micro scheduling policy objects using RMRRSS policy as an example. RMRRSS is a scheduling policy object where periodic threads are executed using a rate-monotonic scheduling and aperiodic threads are executed using sporadic servers. The aperiodic threads bound to the same sporadic server are scheduled in round-robin manner. Figure 3 shows the class hierarchy of RMRRSS scheduling policy in our scheduler. RMAPS is a scheduling policy which manages the rate monotonic scheduling policy as a base scheduling policy with arbitrary aperiodic server policies. The periodic threads are scheduled by RM object through a super link, and the aperiodic threads are scheduled by SS object through an aps link, when the threads are bounded to aperiodic servers and the aperiodic threads executed as 1 In
our implementation, the rate monotonic scheduling module needs to call a background scheduling module.
SchedObj
Super Link
Super Link
FIFO
Super Link
RM
Super Link
Super Link
DS Super Link
SS
RMAPS
RR
Aps Link Super Link Back Link
RMRRSS
Figure 3: Structure of Policy Module background threads are scheduled by RR object through a back link.
4
Discussion
In this section, we discuss the following five topics about our framework. Three topics are related to policy management and the rest of two topics are related to the implementation of our framework. How to Select Policies: Our framework can provide the mechanism to change resource management policies. However, the selection of policies must be done by application designers. We need a systematic approach to choose policies based of the requirements of applications. For example, continuous media application can specify the quality of service parameters reflecting the requirement for delay, jitter bound and throughput of communication. A resource management object for network translates high level specifications into low level parameters and decide the selection of policies. Hierarchical Policies: A certain type of resource management object need to control a group of resources in a hierarchical fashion. In a simple two-level case, a policy can be divided into global and local policies. For example, a first class user level thread package may use two-level hierarchical scheduling policies. A local policy controls scheduling of threads within an address space and a global policy decides the address space which can acquire a processor. A memory management object can also take two-level policies. A global policy selects a suitable a set of local memory resources and delegate requests to the local policy. The important point is that some global policies must be consistent with local poli-
cies. We need a mechanism to constrain the selection of policies to avoid the misuse of the combination of policies. Coordination of Conflicted Policies: Different resource management objects may need to deal with the coordination and the conflict resolution of their policies. For example, a processor allocation policy and a cache memory management policy are closely related with each other. If the cache of a processor still keeps part of the code, data and stack segment of threads, it is better to assign the processor to execute the thread because the strategy can avoid the expensive invalidation and the loading of the cache memory. Real-time computing needs the strict coordination of all resource management objects to avoid unbounded priority inversion and provide predictable computing. Different scheduling domains also need to agree about the meaning of priority and mapping priority between domains. All policy objects should be constrained by the global scheduling policies and the replacement of a global policy should be reflected to each local policy. Flexible Mechanism for Composition: In our framework, the composition of policy objects use inheritance whose semantics and syntax are defined by a language or run-time library. However, a different resource management module may require a different inheritance mechanism. Then, we need a flexible framework to change the semantics of inheritance. However, a flexible mechanism may increase design errors because it is hard for a policy designer who implements policy objects to use flexible mechanisms. The good syntax of a language restricts the use of mechanism and decreases the misuse of composition. The language for implementing policy objects should support not only the mechanism to change the semantics of inheritance but also the mechanism to change the syntax for inheritance. A method cache is used to reduce the cost of method invocations. Each resource management object may use a different method cache strategy. For example, our scheduler framework does not need the invalidation of cache unless we change scheduling policies. In some framework, policy objects may be created and deleted frequently. Each resource management object needs to change object allocation and garbage collection policies. Placement and Binding of Objects: We need to consider which object should be placed in user and kernel spaces or how to communicate between objects. In our implementation, all objects reside in kernel space and object invocations are implemented by indirect function calls, but the support for multiple model need to place the objects in a user space. We may be able to share policy objects and mechanism objects from different resource manager objects implementing different models. If we can place policy objects in user space, adding and debugging policy objects becomes very easy. The use of both upcalls and downcalls between user and kernel
space make it possible to build multiple models in the same systems. Some policy and mechanism objects may reside in kernel space and some objects may reside in user space. Method invocations can hide the placement of objects and enable to access objects transparently.
5
Conclusion
We have demonstrated a new object-oriented framework for structuring a resource management module in operating systems. This framework leads policy/mechanism separation and the composition of simple policies for the resource management module. Further extension of this framework will be performed for archiving a better policy management and its efficient implementation.
References [1] J. S. Chase, H. L. Levy, E. D. Lazowska, and M. BakerHavey, "Lightweight Shared Objects in a 64-bit Operating System", Univesity of Washington, TR-92-03-09, 1990. [2] R. Levin, E. Cohen, W. Corwin, F. Pollack, and W. Wulf, “Policy/Mechanism Separation in HYDRA”, In Proceedings of 5th Symp. on Operating Systems Principle, 1975. [3] "Proceedings of the International Workshop on Network Operating System Support for Digital Audio and Video", International Computer Science Institute, TR90-062, 1990. [4] L. Sha, R. Rajkumar, and J. P. Lehoczky, "Priority inheritance protocols: An approach to real-time synchronization", IEEE Transactions on Computers, Vol.39, No.9, 1990. [5] B. Sprunt, L. Sha and J. P.Lehoczky, "Aperiodic Task Scheduling for Hard-Real-Time Systems", The Journal of Real-Time Systems, Vol.1, No.1, 1989. [6] M. Stefik and D. Bobrow, "Object-Oriented Programming: Themes and Variations", The AI Magazine, Vol.6, No.4, 1986. [7] H. Tokuda, T. Nakajima, and P. Rao, "Real-Time Mach: Towards a Predictable Real-Time System", In Proceedings of USENIX Mach Workshop, October, 1990. [8] H. Tokuda and T. Nakajima, "Evaluation of Real-Time Synchronization in Real-Time Mach", In Proceeding of USENIX 2nd Mach Symposium 1991.