Real-Time Scheduling and Synchronization in ... - Semantic Scholar

1 downloads 0 Views 364KB Size Report
a rate monotonic scheduling. In this section, we will describe a real-time thread model and a real-time scheduling model we have implemented in Real-Time ...
Real-Time Scheduling and Synchronization in Real-Time Mach Hideyuki Tokuda and Tatsuo Nakajima

School of Computer Science Carnegie Mel lon University Pittsburgh, Pennsylvania 15213 [email protected]

Abstract A micro kernel-based operating system architecture is becoming common for advanced distributed computing systems. Advantages of using such micro kernel for real-time applications is that the preemptability of the kernel is better, the size of the kernel becomes much smaller, and addition of new service is easier. However, such a micro kernel alone cannot provide a predictable, distributed real-time computing environment due to many unpredictable delay caused by unbounded priority inversions. In this paper, we report new extensions for supporting real-time thread, synchronization, and scheduler in Real-Time Mach. This extension is based on a real-time thread model and integrated time-drive scheduler model. We describe the interface, implementation results, and measured performance of the system functions. Our benchmark results demonstrated that policy/mechanism separation of real-time scheduling and synchronization facility is important. A proper choice of scheduling and locking policy can avoid unbounded priority inversions and improve the processor schedulability and worst case response time signi cantly.

1

This research was supported in part by the U.S. Naval Ocean Systems Center under contract number N66001-87-C-0155, by the Oce of Naval Research under contract number N00014-84-K0734, by the Defense Advanced Research Projects Agency, ARPA Order No. 7330 under contract number MDA72-90-C-0035, by the Federal Systems Division of IBM Corporation under University Agreement YA-278067, and by the SONY Corporation. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing ocial policies, either expressed or implied, of NOSC, ONR, DARPA, IBM, SONY, or the U.S. Government.

i

Contents

1 Introduction

1

2 Motivation

1

3 Real-Time Thread and Scheduling Model

4

3.1 Real-Time Thread Model : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 3.2 Integrated Time-Driven Scheduling Model : : : : : : : : : : : : : : : : : : : : : : 3.3 Real-Time Synchronization : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

4 Implementation

4.1 RT-Mach Platform : : : : : : : : : : 4.1.1 Micro RTS Environment : : : 4.2 System Interface : : : : : : : : : : : 4.2.1 Real-Time Thread : : : : : : 4.2.2 Real-Time Synchronization : 4.2.3 Scheduling Policy Interface : 4.3 Internal Structure : : : : : : : : : : 4.3.1 Policy/Mechanism Separation 4.3.2 ITDS Scheduler : : : : : : : : 4.3.3 Synchronization Module : : :

: : : : : : : : : :

: : : : : : : : : :

: : : : : : : : : :

: : : : : : : : : :

: : : : : : : : : :

: : : : : : : : : :

: : : : : : : : : :

: : : : : : : : : :

: : : : : : : : : :

: : : : : : : : : :

: : : : : : : : : :

: : : : : : : : : :

: : : : : : : : : :

: : : : : : : : : :

: : : : : : : : : :

: : : : : : : : : :

5.1 Cost Analysis and Predictability : : : : : : : : : : : : : 5.1.1 Preemption Cost Analysis : : : : : : : : : : : : : 5.1.2 Thread Management : : : : : : : : : : : : : : : : 5.1.3 Scheduler Management : : : : : : : : : : : : : : 5.1.4 Synchronizarion Management : : : : : : : : : : : 5.2 Benchmark and Schedulability Analysis : : : : : : : : : 5.2.1 The E ect of Priority Inversion : : : : : : : : : : 5.2.2 Schedulability Analysis under Priority Inversion 5.2.3 Response Time Analysis for Aperiodic Threads :

: : : : : : : : :

: : : : : : : : :

: : : : : : : : :

: : : : : : : : :

: : : : : : : : :

: : : : : : : : :

: : : : : : : : :

: : : : : : : : :

: : : : : : : : :

: : : : : : : : :

: : : : : : : : :

: : : : : : : : :

: : : : : : : : :

: : : : : : : : :

5 Evaluation and Schedulability Analysis

: : : : : : : : : :

: : : : : : : : : :

: : : : : : : : : :

: : : : : : : : : :

: : : : : : : : : :

: : : : : : : : : :

: : : : : : : : : :

: : : : : : : : : :

: : : : : : : : : :

4 5 6

7

7 7 8 8 9 10 10 10 11 13

15

15 15 16 17 18 20 20 21 22

6 Related Work

23

7 Conclusion and Further Work

25

ii

1

Introduction

Micro kernel-based distributed operating systems are becoming common for advanced distributed computing systems. A micro kernel provides basic resource management functions such as processor scheduling, memory object management, IPC facility, and low-level I/O support [1, 24, 5, 21]. Traditional operating functions such as le system and network services are all provided in a server task which runs as a user-level application program. At Carnegie Mellon University, the Mach operating system has been developed for variety of machine architectures including MIMD multiprocessor architectures, advanced workstations, and laptop machines. The pure kernel's extensibility, tailorability, and portability has been demonstrated by supporting two rather diverse operating system environments: 4.3BSD UNIX on a variety of platforms and the Macintosh OS on the Macintosh II [10]. A new challenge is to extend the pure kernel to create a set of new servers to support realtime, secure, or fault-torelant computing domains. There are a few results related to a secure version of Mach [8] and a fault-torelant extension [4, 9, 2]. However, no investigations were reported for extending Mach for distributed real-time computing domain. Our group has been working on a real-time version of the Mach kernel in order to bridge the gap between a traditional real-time executive and a time-sharing operating system, like UNIX. Unlike a traditional real-time system in a single processor domain, Real-Time Mach has been designed to provide a predictable, distributed real-time computing environment[33]. Our aim is to allow a system designer to analyze the runtime behavior at the design stage and predict whether the given real-time tasks having various types of system and task interactions (e.g., memory allocation/deallocation, message communications, I/O interactions, etc) can meet their timing requirements. However, system designers of such complex systems lack systematic development methods and analysis tools, so they resort to ad hoc methods to develop, test, and verify real-time systems[27]. For processor scheduling, for instance, the cyclic executive model which uses time line analysis to schedule real-time activities is not suitable for a distributed real-time application. It is very dicult to test and tune the executive based on some changes in the task set or its timing requirements for complex real-time systems [13]. In this paper, we describe new scheduling and real-time synchronization facilities in RealTime Mach and its implementation and performance evaluation. In Section 2, we rst describe our motivation and address the potential problem with the existing scheduling and synchronization techniques in traditional real-time systems. Section 3 introduces the real-time thread model, real-time synchronization primitives and an integrated time-driven scheduler (ITDS). Section 4 shows the kernel interface and implementation details of the proposed functions. Section 5 discusses the performance evaluation and demonstrates the e ectiveness of our approach using some benchmark programs. In Section 6, we also compare our approach to other micro kernel-based systems. Section 7 summarizes the development status and considers future work. 2

Motivation

Although the UNIX user community has been sharing many programs and programming tools successfully, real-time programmers never had such a common computing environment for various real-time applications. It is mainly due to lack of a common real-time operating system 1

and lack of standard programming model, like a real-time thread interface and synchronization mechanisms. Many real-time applications, on the other hand, are getting more complex and need to use a network of wide variety of hardware platforms such as single board computers, advanced workstations, and multiprocessors. A commercial real-time executive can support a single board computer well with a small amount of memory, but often lacks a real-time network facility and multiprocessor support. Various commercial real-time version of UNIX can provide good interoperability by TCP/IPbased network support, but a round-robin based processor scheduling policy and FIFO-based message handling in protocol modules are unacceptable for among real-time tasks. The major goal of Real-Time Mach is to provide a uniform, predictable distributed real-time computing environment for a network of various types of hardware platforms. Our aim was to bridge the gap between real-time executives and real-time UNIX for distributed applications. The micro kernel-based approach was taken since it has a potential to improve preemptability of the kernel since all UNIX related codes are now in a user-level task. Because of its size, the micro kernel can be size-competitive with the traditional real-time executives without losing functionality of the kernel. Furthermore, it is much easier to modify the micro kernel and add a new server task on top of the kernel. Let us now consider the following problems we often face in building a distributed real-time application using a traditional real-time executives and UNIX.

     

Priority inversion problem Unpredictable periodic activity Hardwired scheduling policy Implicit binding problem (between a code and timing constraint) Transient overload problem Gap between scheduling theory and programming model

A priority inversion problem occurs in a system when a high priority activity waits for a lower priority activity to complete. In particular, let us assume that the high and lower priority threads are sharing a critical section in the system, and the lower priority thread rst become runnable and executes the critical section. While the lower priority thread in the critical section, the high priority thread must wait for the lower thread. Now suppose that another thread with a medium priority becomes runnable. This thread begins its computation without any needs for this critical section and will preempt the low priority thread inde nitely, thus we have potentially unbounded priority inversion. It is often dicult to bound a system service time or the worst case blocking time due to the priority inversion problem. Detecting and avoiding such priority inversion problems in the actual systems is also very dicult since the kernel does not keep track which thread is running in what critical section and under what priority level. In communication scheduling, we often encounter more serious priority inversion problems than in processor scheduling since some of the protocol implementation is done by a simple hardware and software module. A priority inversion occurs when outgoing message or incoming message are queued in FIFO order. The annoying nature of the problem is that message or 2

packet queueing may occur more than one level of protocol layering in the system. A simple inversion problem at one level may cause further problems at lest of the layers. Thus, the system must avoid the inversion problems in a consistent manner across the protocol layers[32]. The second problem is related to the management of periodic activities in a real-time system. In a traditional system, it is very common to use the following program structure to implement a periodic process. 1 2 3 4 5

while (TRUE) { ... delay_time = next_time - current_time; sleep (delay_time); }

If a system use xed priority preemptive scheduling, delay time may have a wrong value. If the system preempts this thread just after current time is fetched and then resumes the thread later, delay time is set with the wrong value of current time. So the program might sleep for longer duration than it supposed to be in the sleep statement. The hard wired scheduling policy problem is that most of the existing UNIX-like operating systems do not allow a user to implement or choose a proper scheduling policy for the user's realtime application. The operating system architecture is closed in a sense that no user can tailor the built-in scheduling policy for a special real-time application. The scheduling facility should be implemented by separating the policy and mechanism modules, so that each application can choose or even implement the suitable scheduling policy for the application. The implicit binding problem is due to a bad practice of implicit binding between the program and its timing constraint. It is very dicult to reuse any existing real-time programs, since a new user cannot extract any timing information from the given program le. In order to reason about the real-time behavior, an explicit speci cation of the timing constraint should be adopted in a real-time program. Similarly, it is very dicult to track down a timing bug in a real-time application. The timing bug tends to penetrate a normal software module boundary and is sometimes not reproducible. It is a very dicult problem to localize a timing error in the system. The transient overload problem is a source of yet another unpredictable system behavior in many traditional real-time operating systems. When the system faces a transient overload conditions due to excess external events, the system may not be able to complete all activities in the system. Under such condition, a normal scheduler cannot distinguish which tasks are most important to the system. So, it is often aborting threads in an arbitrary fashion and it becomes very dicult to control the system behavior. Finally, we need to bridge the gap between the current computing practice in a real-time application and the results from various scheduling theories. Traditional processor scheduling models do not indicate how a real-time thread is speci ed in a real-time application. It is rather simple integers which represents its execution time and period. What we need is an operational interface which provides us both ability to perform the schedulability analysis and we need a good programming interface. In the following section, we will describe our real-time thread model. The model can provide a simple way of de ning periodic and aperiodic activities and can be a base for schedulability analysis. We will also introduce our integrated-time driven scheduling (ITDS) model and describe how the system can support hard and soft real-time threads, synchronization among them, and how to perform schedulability analysis. 3

Current Time

T

O d

C

S S: Start Time O: Phase Offset D: Deadline

T

O d D S+T

time

C D S+2T

T: Period C: Worst Case Execution Time d: Scheduling Delay

Figure 1: Timing attributes of a periodic thread 3

Real-Time Thread and Scheduling Model

The objective of the RT-thread model is to support a predictable real-time scheduler and provide a uniform system interface to both real-time and non-real-time threads. It is also aimed at providing a better interface to adopt the well-known schedulability analysis technique, such as a rate monotonic scheduling. In this section, we will describe a real-time thread model and a real-time scheduling model we have implemented in Real-Time Mach. 3.1

Real-Time Thread Model

A thread can be de ned for a real-time or non-real-time activity. Each thread is speci ed by at least a procedure name and a stack descriptor which speci es the size and address of the local stack region. For a real-time thread, additional timing attributes must be de ned by a timing attribute descriptor. A real-time thread can be also de ned as a hard real-time or soft real-time thread. By a hard real-time thread, we mean that the thread must complete its activities by its hard deadline time, otherwise it will cause undesirable damage or a fatal error to the system. The soft real-time thread, on the other hand, does not have such a hard deadline, and it still makes sense for the system to complete the thread even if it passed its critical (i.e. soft deadline) time. A real-time thread can be also de ned as a periodic or aperiodic thread based on the nature of its activity. A periodic thread P is de ned by the worst case execution time C , period T , start time S , phase o set O , and task's semantic importance value V . In a periodic thread, a new instantiation of the thread will be scheduled at S and then repeat the activity in every T . The phase o set is used to adjust a ready time within each period. If a periodic thread is a soft real-time thread, it may need to express the abort time which tells the scheduler to abort the thread. Figure 1 depicts the timing attributes of a hard periodic real-time thread. An aperiodic thread AP is de ned by the worst case execution time C , the worst case interarrival time A , deadline D , and task's semantic importance value V . In the case of soft real-time threads, A indicates the average case interarrival time and D represents the average response time. Abort time can be also de ned for the soft real-time thread. Figure 2 depicts the timing attributes of a hard aperiodic real-time thread. 2 i

i

i

i

i

i

i

j

j

i

j

j

i

j

j

2

When a hard real-time thread is aperiodic, we call it a sporadic thread where consecutive requests of the task initiation are kept at least Q units of time apart [20].

4

Current Time A

A+∆

time d e1

C

d D

C D

e2

A: Worst Case Interarrival Time D: Deadline d: Scheduling Delay

e3

C: Worst Case Execution Time ei: External Event

Figure 2: Timing attributes of an aperiodic thread 3.2

Integrated Time-Driven Scheduling Model

The objective of the integrated time-driven scheduling model is to provide predictability, exibility, and modi ability for managing both hard and soft real-time activities in various real-time applications. For the hard real-time activities, the ITDS model allows a system designer to predict whether the given task set can meet its deadlines or not. For the soft real-time activities, the designer may predict whether the worst case response times meet the requirements or not. Under a transient overload condition, the ITDS model uses tasks' importance value to decide which task should complete its computation and which should be aborted or canceled. In the ITDS model, we adopted a capacity preservation scheme to cope with both hard and soft real-time activities. By capacity preservation we mean that we divide the necessary processor cycles between the two types. 3 We rst analyze the necessary processor cycles by accumulating the total computation time for the hard periodic and sporadic activities (i.e.,

X k

i

=1

i i ) and then apply rate monotonic schedul-

C T

ing policies [18]. It should be noted that our choice of a rate monotonic or deadline monotonic paradigm instead of a dynamic scheduling policy, like earliest deadline is that it is extremely dicult to utilize such deadline information for scheduling a message at media access level or a bus-transaction on a system internal-bus. At least, given a set of real-time activities, we can easily map the period into a xed integer priority domain. For instance, given a set of periodic, independent tasks in a single processor environment, with the rate monotonic scheduling algorithm the worst case schedulable bound is 69%[18], the average case is 88%[17], and the best case, where threads have harmonic periods, is up to 100% of the CPU utilization. After analyzing the hard real-time activities, we will assign the remaining schedulable amount of the unused processor cycles to the soft real-time tasks. In order to guarantee that no soft task will consume more than the allotted processor cycles, we create a special periodic server, called the deferrable server[26]. (Even though we call it server, it is not implemented as an ordinary task.) Then, the deferrable server will preserve the processor cycles during its period and if there is an aperiodic request, the server will assign the remaining processor cycles to the requested task. If the requested task could not complete its execution in the deferrable server's capacity, it must block until the beginning of the next cycle of the server. Otherwise, there is a 3

capacity-based scheduling algorithms are used in di erent types of scheduler such as in Fair Share Scheduler [11] and in FDDI's capacity allocation scheme [23]. However, our motivation was not fairness rather to reduce the response time of aperiodic activities while we guarantee the periodic tasks' deadlines are met.

5

possibility that other periodic tasks may not meet their deadlines. 3.3

Real-Time Synchronization

When tasks are synchronizing each other by sharing a resource, a schedulability analysis becomes a bit more complex than the independent task set. There are two approaches to bound the worst case blocking time among tasks. One is a kernelized monitor and the other is a priority inheritance scheme. In the kernelized monitor protocol, while a task is executing in a critical section, the system will not allow any preemption of that task. Suppose that if n tasks are scheduled in the earliest deadline rst order, the worst case schedulable bound is de ned as follow.

X n

j

C

=1

1

j +C S Tj

where C , T , CS represents the total computation time of T hread and the period of T hread , and the worst case execution time of the critical section respectively. In general, if the duration of CS is too big, the system cannot satisfy the schedulability test and end up with reducing the number of tasks and running with a very low total processor utilization. On the other hand, if we relax the kernelized monitor and allow a preemption during the critical section, we face the unbounded priority inversion problem as we discussed in Sectoin 2. One way of avoiding the priority inversion is to use a priority inheritance protocol[25]. Once the higher priority task blocks on the critical section, the low priority task will inherit the high priority from the higher priority task. In this way, the highest priority thread's worst case blocking time is bounded by the size of critical region independent of the medium priority tasks. One of ecient priority inheritance protocols is called a priority ceiling protocol (PCP)[22]. Using these inheritance protocols under a rate monotonic policy, we can also check schedulable bound for n periodic threads as follows. i

i

i

8i; 1  i  n;

i+ Ti

B

X i

j

=1

j j

C T

i

 i(2 1i 0 1)

where C , T , B represents the total computation time, the period, and the worst case blocking time of T hread respectively. The worst case bound of n tasks from the above formula is very pessimistic since it can handle arbitrary set of tasks. In general, often tasks have a harmonic periods and also randomly generated tasks are often schedulable by the rate monotonic scheduling algorithm at much higher utilization levels. The following formula can be used to check the exact behavior of task set for schedulability[22]. i

i

i

i

8i; 1  i  n;

min( )2 i [ k;l

R

X0 1 1

i

j

=1

j d l1Tk e + Ci + Bi ]  1 l 1Tk l 1Tk k Tj

C

l T

These formula allow us to predict the schedulability of given a set of tasks. In Section 5, we will use the above formula for predicting the schedulability of periodic tasks and for comparing the measured results. 6

4

Implementation

The current version of RT-Mach is being developed using a network of SUN, SONY workstations, laptop and single board target machines. We rst experimented with our real-time thread model using a modi ed version of Release 2.5 Mach kernel which can support xed priority thread scheduling. We then moved to the current pure kernel based environment. The pure kernel provided us much better execution environment where we can reduce unexpected delays in the kernel and can run real-time threads without having a UNIX server. The preemptability of the kernel was also improved signi cantly since UNIX primitives and some device drivers are no longer in the kernel. In this section, we will rst describe our RT-Mach platforms and a micro real-time server (RTS) environment. We then discuss new RT-Mach interfaces for real-time thread. Lastly we present the internal structure of scheduling and synchronization module. 4.1

RT-Mach Platform

A platform for RT-Mach has di erent requirements in terms of its target execution environment. For embedded real-time applications, we need to support not only various types of workstations, but also a wide variety of single-board or multiple-board based target machines. It is very common to have a target board without having a local disk, a local console, and a keyboard. Interaction between a user and the target board often takes through a network or a serial or parallel communication line.

4.1.1 Micro RTS Environment One unique feature of RT-Mach is that it can provide various execution environments for realtime embedded programs without or with UNIX facility. We call it a Micro RTS environment. The current Micro RTS environment can be created for user application programs with the following four-levels of system supports.

Level 1: Level 2: Level 3: Level 4:

a micro RTS a micro RTS and a real-time le system a micro RTS, a real-time le system, and a simple UNIX-like Server, POE a micro RTS and a 4.3BSD UNIX server

The Level 1 support is for simple real-time application programs where real-time task/thread management and synchronization are provided. RTS allows a user program to initiate real-time tasks from a memory-resident con guration le without having any additional le system. Level 2 is with a memory resident real-time le system. It provides a memory mapped le access support for a real-time task. Level 3 can add a POE environment, a UNIX-like operating system environment with the real-time le system. At level 3, a user can issue the most of the UNIX-like system primitives and mount and unmount a local disk-based le system and the real-time le system. At level 4, a user program can issue a 4.3 BSD UNIX system calls in addition to the Mach kernel primitives. All basic UNIX features like TCP/IP networking services, sockets, les, and pipes are provided by the UNIX server. 7

For all cases, a user can create an embedded application program on top the RT-Mach kernel. Unlike a regular UNIX environment, a single bootable image le can contain the embedded application programs and the kernel. Thus, all necessary real-time programs can be downloaded at the time of system booting. 4.2

System Interface

This section describes new system interface added for RT-Mach. The interface divided into three categories: thread interface, synchronization interface, and scheduling policy interface. The rest of the section presents each interface.

4.2.1 Real-Time Thread A thread can be created, within a task, by using the rt thread create primitive. As we described in the model, it can be a periodic or aperiodic thread depending on its timing attributes. The timing attributes are speci ed in the corresponding time descriptor, and the user stack regions are also given by the stack descriptor. If a creation is successful, a unique thread port will be returned. A thread can be terminated by calling rt thread exit primitive. If a thread is a periodic thread, a new instantiation of the thread will be scheduled for the next start time and a new thread id will be assigned. The rt thread kill primitive terminates the speci ed thread while the rt thread wait primitive blocks the caller thread until the target thread terminates. The rt thread self primitive returns the thread id of the caller. The rt thread set attribute and rt thread get attribute primitive are used to assign or get the value of the attribute respectively. The brief description of the summary of thread interface and the thread attribute is shown below. kval t = rt thread create( parent, child thread, thread attr, entry point, arg ) kval t = rt thread exit( ) kval t = rt thread kill( thread ) kval t = rt thread wait( thread ) thread t = rt thread self( ) kval t = rt thread set attribute( thread, thread attr ) kval t = rt thread get attribute( thread, thread attr ) typedef stuct time desc f int rt type; union f struct rt Periodic f time value t rt start; time value t rt period; time value t rt o set; g rt periodic; struct rt Aperiodicf time value t rt wcia; apserver t rt apserver; g rt aperiodic; rt attribute; time value t rt wcec;

/3 periodic or aperiodic thread 3/ /3 start time 3/ /3 period or response time info 3/ /3 phase o set 3/ /3 worst case interarrival time 3/ /3 aperiodic server number 3/ /3 worst case exec time 3/

8

time value t rt deadline; time value t rt abort; int rt value;

/3 deadline 3/ /3 abort time 3/ /3 semantic value 3/

111

g

time desc t;

typedef struct stack desc f vm address t rt stack addr; vm size t rt stack size; 111

g

stack desc t;

typedef struct thread attribute f time desc t time desc; stack desc t stack desc; 111

g

thread attr t;

4.2.2 Real-Time Synchronization Synchronization among threads is necessary since all threads within a task share the task's resources. The synchronization mechanism in RT-Mach is based on mutual exclusion using a lock variable. A thread can allocate, deallocate, and initialize a lock variable. A simple pair of rt mutex lock and rt mutex unlock primitives is used to specify mutual exclusion. The rt mutex trylock primitive is used for acquiring the lock conditionally. We also support the modi ed version of condition variable, but we only describe mutual exclusion primitives in this paper. A lock attribute can be set at the allocating time of its lock variable. The attribute speci es the synchronization policy which determines the queueing policy and the priority ordering. The interface of and brief description of the synchronization functions and lock attribute is as follows. kval t = rt kval t = rt kval t = rt kval t = rt kval t = rt

mutex allocate( lock, lock attr ) mutex deallocate( lock ) mutex lock( lock, timeout ) mutex unlock( lock ) mutex trylock( lock )

typedef struct lock attr f type t rt type; priority t rt priority; g

lock attr t;

/3 lock type 3/ /3 ceiling priority 3/

111

rt type indicates a lock policy given by users. rt priority is used for the ceiling protocol and speci es the ceiling priority. If NULL value is set as a lock attribute, a default policy (i.e., basic policy, see Section 4.3.3) is chosen.

9

4.2.3 Scheduling Policy Interface The system uses the xed priority policy as the default policy, however, we can change scheduling policies by using the following functions. rt sched get policy gets the current scheduling policy, rt sched set policy sets and activates a new scheduling policy. rt aperiodic server create creates a new aperiodic server, rt aperiodic server bind binds the thread to the aperiodic server indicated by the argument. The scheduling policy attribute is used to pass policy-speci c arguments such as a aperiodic server's period, capacity of the server. The following is the summary of interface and a brief description of a policy attribute. kval t = rt kval t = rt kval t = rt kval t = rt

sched get policy( policy attr ) sched set policy( policy attr ) aperiodic server create( policy attr ) aperiodic server bind( policy attr )

typedef struct policy attr f rt policy; policy t rt server type; policy t time value t rt capacity; time value t rt period; g

policy attr t;

/3 scheduling policy 3/ /3 type of aperiodic server 3/ /3 capacity of aperiodic server 3/ /3 period of aperiodic server 3/

111

speci es the base scheduling policy such as rate monotonic policy. The default policy is used if the a user speci es NULL as scheduling policy. rt server type speci es a type of aperiodic server. rt capacity is the capacity of the aperiodic server, and the rt period is the period of aperiodic server. rt policy

4.3

Internal Structure

This section presents the internal structure of RT-Mach. We focus only the structure of the ITDS scheduler and the synchronization module.

4.3.1 Policy/Mechanism Separation The policy/mechanism separation concept is a structuring concept for operating system functions. We applied the technique for processor scheduling and synchronization management. Each module is clearly separated between policy modules and a mechanism module. For example, a scheduling policy module encapsulates a scheduling policy such as the FCFS, round robin, or priority-based algorithms. The important issues in the policy/mechanism separation are

Separation of policies:

How the policy module should be separated from the mechanism.

Placement of policies:

Where the policy module should be placed in the system. 10

Communication between policy and mechanism:

How they should communicate (interact) with each other.

Selection of policy:

How the system or user can select the current (or active) policy at run time.

A traditional policy/mechanism separation can be achieved by de ning di erent functions or procedures for each policy and mechanism. But it is more natural to make a policy module as an abstract data type or object since it is a self-contained unit. The placement of policy should be determined based on the modi ability of the policy object and the system overhead involved in accessing the policy object from the lower level mechanisms. The communication scheme will impact the system overhead signi cantly when the policy object is related to very low-level system functions. If the policy object is de ned as an abstract data type, the mechanism can be called in the form of procedure or function call. The selection can be done in a static or dynamic fashion. An example of the static selection scheme is binding done at system compile time. Similarly, using the dynamic linking technique, the policy object can be bound to the mechanism layer at run time. In RT-Mach, all scheduling and synchronization policies are implemented as objects. Each object encapsulate the implementation of its policy. The communication between the policy object and its mechanism was done through function calls, not message passing.

4.3.2 ITDS Scheduler The ITDS scheduler provides an interface between the scheduling policies and the rest of the operating system. An object-oriented approach is used to implement the scheduler with the policies embedded in the policy object. Each instantiation of the scheduler may have a di erent scheduling policy governing the behavior of the object. Since Mach supports a multiprocessor environment, ITDS allows a user to select a scheduling policy for a processor set. A processor set is an user-de ned object in Mach and binds a thread and a set of physical processors. A special processor set called a default processor set exists. Before new processor set is created, all processors belong to a default processor set. Figure 3 shows a block diagram of the ITDS scheduler which indicates the relationship between the scheduler and policy objects. Mach scheduler uses the three functions: thread block, thread choose, and thread setrun for controlling a run queue. We change the three primitives for ITDS scheduler. The functions call interface layer which translates the functions to the interface of dispatch object. Dispatch object can be replaced for each target architecture, which forwards the operation to a ITDS object attached to each processor set. Dispatch object provides the control of preemption, for example which processor is preempted. In single processor machines, only one processor set named default processor set is supported, then system has only one ITDS object.

Interface: The ITDS object and the policy objects have the same interface. We list the interface below. Operations are called whenever the system encounters a scheduling event associated with a thread. kval t kval t

= itds startup( pset ) = itds shutdown( pset ) 11

Mach Scheduler Interface

thread_block, choose_thread, thread_setrun

Interface Object

ITDS Policy Objects

Dispatch Object

RR

FP

ITDS Interface

RM

RMPOLL

RMDS

Policy Interface

ITDS Object

Figure 3: ITDS Scheduler kval t kval t thread t kval t boolean t boolean t int preempt t preempt t server t kval t

= itds = itds = itds = itds = itds = itds = itds = itds = itds = itds = itds

run( thread, pset ) block( thread, pset ) choose( pset ) kill( thread, pset ) csw check( thread, pset ) comp priority( thread1, thread2, pset ) map priority( thread, max priority, pset ) timer( usec, pset ) aperiodic server( usec, pset ) create aperiodic server( type, quantum, period, pset ) bind aperiodic server( server, thread, pset )

A scheduling policy directs not only the management of a run queue, but also determines the priority ordering. Since ITDS encapsulates the priority management, the rest of thread and synchronization management can be easily created for many other policies which use di erent priority ordering. The above operations fall into ve categories. The operations in rst category manage the binding of policy and mechanism. The itds startup binds a policy and a mechanism, and itds shutdown unbinds a policy and a mechanism. The second category processes the run queue. itds run enqueues the thread to run queue, itds block noti es the thread is blocked. itds choose chooses the next runnable thread, and itds kill removes the runnable thread from the run queue. The third category checks which threads have higher priorities. itds csw check checks whether the current thread needs the preemption, and itds comp priority compares the priorities of two threads, and itds map priority maps the priority to integer value. Max priority indicates the possible highest integer priority. If mapping is impossible, it noti es it to caller. The operation is used to make a priority queue. The fourth category is called by the kernel periodically. itds timer processes the quantum of threads. The last category manages the aperiodic servers. itds aperiodic server resets the aperiodic servers at every period, itds create aperiodic server created the new aperiodic servers, and itds bind aperiodic server binds the thread to an aperiodic 12

server.

Scheduling Policy Object: The current version of RT-Mach supports the following policy

objects.

Round Robin[RR]: Threads are executed in a round-robin fashion. Fixed Priority[FP]: The thread priority is assigned at the creation time and is xed. The highest priority thread is selected rst.

Earliest Deadline Fast[EDF]: The thread with the earliest deadline is selected rst. Rate Monotonic with Background Server[RM]: The periodic threads are selected based

on the rate monotonic scheduling algorithm, and aperiodic tasks are executed in a background mode.

Rate Monotonic with Polling Server[RMPOLL]: Similar to RMBG, but the aperiodic tasks are executed whoever the polling server can nd a runnable aperiodic task.

Rate Monotonic with Deferrable Server[RMDS]: Unlike RMPOLL, the deferrable server allows an aperiodic task to be executed at any given time as long as the server's reserved execution time last.

As for the implementation of each policy object, we can take advantage of the type inheritance technique to reduce the code size. For example, RMPOLL and RMDS inherits the operation of RM. The code size of RM is about 700 lines, but the code size of RMPOLL and RMDS are about 400 lines. The fact shows that large part of RM operation is inherited to RMPOLL and RMDS. The scheme make it easy to create a new policy.

4.3.3 Synchronization Module Similar to the ITDS scheduler, the synchronization facility is also divided into the common lock object and lock policy objects. While the ITDS object exists in each processor set and controls several threads, the lock policy object is created for each lock object and controls the priority of the thread which is holding the lock object. Figure 4 shows the relationship between various lock policy objects and the common lock object. The common lock object o ers the mechanism to manage the blocking and wakeup of threads for exclusive execution. The common lock object manipulates the queues of threads waiting for the releasing the lock. It calls the lock object and the ITDS object to reorder the waiting threads. The lock object then up calls its associated lock policy object. The lock policy object often controls the thread's e ective priority.

Interface: The operations for the lock object are triggered by a system call from a user. The thread executes these operations without blocking. If the thread must be blocked inside these operations, it returns to the common lock object. The common lock object determines whether to block the thread or not, and re-execute the policy operations. The lock object has the following operations for each policy. Each policy object has the same operations. 13

Mutex Syscall Interface

Lock Policy Objects

Common Lock Object

BP

BPI

KM

PCP

Lock Objects

Figure 4: Lock Policy Module kval t = mutex kval t = mutex kval t = mutex kval t = mutex kval t = mutex

lock acquired( lock ) lock not acquired( lock ) lock unlock( lock ) lock abort( lock ) lock priority changed( thread )

is called when the mechanism layer needs to acquire the lock. This operation keeps track the current thread's priority for the lock object. mutex lock not acquired is used when the mechanism module cannot acquire the lock. In the case of basic priority inheritance protocol, the operation inherits the higher thread's priority. mutex lock unlock is invoked from mutex unlock. It resets the lock structure. In the case of priority inheritance, the operation recovers the priority of the thread which executes unlock. mutex lock abort is used when the lock is aborted. The operation is called in two ways: one is when lock or unlock operation fails and the other is when the thread in the locked region need to be aborted. The operation is also called when timeout occurred. mutex lock priority changed is called when a thread changes its priority. The operation is used in dynamic priority schemes. mutex lock acquired

Policy Module: A lock policy module is implemented as an object similar to the scheduling policy object. The following lock policy objects are supported. Basic Priority[BP]: All operations of this policy are null functions. Thus, all waiting threads are queued in the lock object based on the thread's priority.

Basic Priority Inheritance[BPI]: the lower priority thread executing the critical section inherits the priority of higher priority thread, when the lock is con icted.

Kernelized Monitor[KM]: If any thread enters to the kernelized monitor, all preemption is prevented. Thus, duration of the priority inversion is bounded by the size of the critical region. 14

Thread A

Cnon_int

Csys_left

Thread B

Interrupt

Cwakeup

handler

Cint_hdr

Cint_left Cblock

Cdispatch

Scheduler

Csched_call

Cchoose

Figure 5: Preemption during a System Call

Priority Ceiling Protocol[PCP]: The execution of thread is blocked when the priority ceil-

ing of the lock is not higher than all locks which are owned by other threads. The protocol prevents deadlock, and chained blocking.

The lock policy object is easy to de ne. Among all policies, the priority ceiling protocol is the largest policy module. The priority inheritance protocol is used in both basic priority inheritance protocol and priority ceiling protocol. The code can be shared in both object. The advantage of the approach is not only the reduction of code size, but also it make clear the algorithm. 5

Evaluation and Schedulability Analysis

In this section, we measure the basic cost of our primitives and demonstrate schedulability analysis for four benchmark programs. We rst measure the cost for managing real-time thread, synchronization, and scheduling policies. We then analyze the relation between the cost and predictability of the system behavior. The benchmark programs are also presented for demonstrating that our approach o ers better schedulability and responsiveness than traditional scheduling policy such as a xed priority and a locking protocol. 5.1

Cost Analysis and Predictability

The basic cost of the ITDS scheduler and synchronization primitives were measured using a Sony NEWS-1720 workstation (25 MHz MC68030) and a FORCE CPU-30 board (20 MHz MC68030). We simply evaluated a single processor environment of Sony machine. We used an accurate clock on the FORCE board for timing measurement on NEWS-1720 through VME-bus backplane. This clock enabled us to measure the overheads with resolution of 4  s.

5.1.1 Preemption Cost Analysis Before we start analyzing the preemptability of the system, let us rst de ne basic cost factors. Figure 5 de nes the basic cost factors when a higher priority thread preempts a lower priority 15

Cost (s) 88 y1 136 92 y1 80 y1 48 48 376 y2

Basic Operation Cwakeup Csched call Cblock Cchoose Cdispatch Cnull trap Cclockint

y1 block , wakeup and choose , are measured under a xed priority scheduling policy (default) and are policy speci c numbers. y2 This includes the cost calling scheduling policy routines, but no thread wakeup cost. C

C

C

Table 1: The Basic Overhead thread which is executing a system call. C speci es the execution cost of the primitive opr. C is the worst case execution time of a non-interrupt region where all interrupts are masked. A critical interrupt may be delayed until the non-critical region is completed. C is the worst case execution time of the interrupt handlers. Interrupt handler can be interrupted by a higher priority interrupt. C is the time to wakeup a blocked thread. C is the remaining time after the wakeup until the interrupt is completed. C is the remaining execution time of the system call. C is the total scheduling delay time and sum of C , C ,C , and C .C is the delay time to switch to the scheduler. C is the blocking time for giving up the CPU and C is the selection time for a next thread, C is the context switching time. The results of the measurement are summarized in Table 1. Now, let us consider the preemption cost in RT-Mach. The total worst case preemption cost can be de ned as opr

non int

int hdr

wakeup

int lef t

sys lef t

sched call

sched

block

choose

sched call

dispatch

block

choose

C C

=C =C

preempt sched

non int

call sched

+C +C

+C +C

int hdr

block

+C +C

wakeup

choose

dispatch

int lef t

+C

sys lef t

+C

sched

dispatch

Under the xed priority scheduling, C becomes C +C +C + +C 444 s. In our target machine, a clock interrupt handler alone requires at least additional C =C +C = 376s. In a real-time application, the cost of C +C and can be precomputed based on the system con guration, however, the cost for C are operating system speci c. In many monolithic kernel-based system, C C and becomes relatively high. However, in a micro kernel-based system, an ordinary system C call becomes preemptive since its function is implemented in a user-level task. For real-time programs which have shorter deadlines than C , we need to reduce each cost factor fur, further kernelization of the current micro kernel is and C ther down. To reduce C required. preempt

clockint

int hdr

non int

int lef t

int hdr

int lef t

sys lef t

int hdr

int lef t

non int

non int

sys lef t

sys lef t

preempt

non int

sys lef t

5.1.2 Thread Management The creation cost for a thread is de ned as follows.

C

create

=C

resume

+C

resource

+C

stack

16

WCET (s) 1668 1668 + Cstack 432 + Cblock 1732 356 + Cchoose + Cblock 820 + Cnon int + Csys left

Function Create Periodic Thread Create Aperiodic Thread Terminate Itself Terminate Other Thread Exit Periodic Thread Restart Periodic Thread

Table 2: Thread Management Cost

C is the start up cost for a thread. C is the total resource allocation and initialization costs except the kernel stack allocation and C is the stack allocation cost. Table 2 shows the thread management overhead for real-time periodic and aperiodic thread. The cost for creating a real-time thread depends on whether the thread is periodic or aperiodic. Aperiodic thread creation takes more time since it must allocate its kernel stack at the creation time. However, the stack allocation for a periodic thread is preallocated for supporting hard real-time activities. Periodic thread creation can be reduced if the thread does not need to start immediately. As we can see from the table, we should avoid a programming style where a new aperiodic thread is created for every incoming external event. Rather, the aperiodic thread should be precreated at the system con guration time and then just waits for the external event. Similarly, there is a limitation on creating periodic threads. On the target machine, the current interval clock interrupt occurs at every 10 msec, so we cannot create a periodic thread which has less than this interval. The cost for periodic thread activation in each period is resume

resource

stack

C

reincar

=C

restart

+C

exit

.

Here, C is the cost for reincarnation which consist of C and . C is the cost restarting the thread which is same as C . C is the cost resetting the thread for next reincarnation. reincar

restart

preempt

exit

restart

exit

5.1.3 Scheduler Management Achieving higher schedulability and responsiveness requires additional cost than simple algorithms such as round robin scheduling. Each system should select the most suitable policy for their applications. For example, background aperiodic threads can be executed in FCFS, RR or FP order using an aperiodic server with the rate monotonic policy. Selecting a suitable scheduling policy supported by ITDS scheduler makes the system exible and open because a system designer can build his own optimized scheduler for his application. We measured the additional scheduling overhead caused by the policy/mechanism separation of the ITDS scheduler. Table 3 compares the scheduling overhead between a default xed priority scheduler in the original Mach kernel and our modi ed RT-Mach kernel. The results indicate that both versions have the same thread blocking cost. The reason for achieving the same thread blocking time is due to our optimized Mach scheduling interface layer for a single processor architecture. 17

Scheduling Operations Thread block Thread setrun Thread choose

Mach Scheduler (s) 172 52 20

RT-Mach Scheduler (s) 172 88 80

y Both schedulers use xed priority policy. Table 3: The Basic Scheduling Cost

itds block itds run itds choose itds comp priority itds csw check

Fixed Priority (s) 12 24 24 12 12

Rate Monotonic (s) 12 36 + n 2 12 y2 24 24/36 y4 12

Rate Monotonic with Deferrable Server (s) 12/24 y1 36 + n 2 12 24/78 + m 2 8 y3 24/36 y4 12/36 y1

y1 The di erence is due to the current thread is bounded by the aperiodic server or not. y2 n is the length of a run queue. y3 m is a number of aperiodic servers. y4 The di erence is due to a periodic or aperiodic thread. Table 4: A Comparison of Scheduling Policies in ITDS Scheduler However, thread setrun costs about 1.7 times and thread choose costs 4 times more than the original Mach scheduler. Most of the additional costs are due to policy/mechanism separation. The actual overhead will be caused by thread setrun, since thread choose function is always called from thread block in RT-Mach. The di erence between two systems is proportional to the number of thread setrun calls. Table 4 compares the implementation costs among xed priority, rate monotonic, and rate monotonic with deferrable server policies. The scheduling overhead alone cannot determine the e ectiveness of the scheduling policy, but this results indicate the estimate of worst case execution time of each policy. Because of the 32-level priority domain, the xed priority policy can execute all internal scheduling functions in a constant time. Each priority is mapped into an index to an array of run queues. In contrast, a rate monotonic policy uses thread's period as its priority and the run queue is implemented as a priority queue using a linked list. Thus, The cost for making a runnable thread and for choosing the next thread is not in a constant time, rather it is proportional to the number of runnable threads. The rate monotonic with deferrable server policy needs additional selection costs than the pure rate monotonic, since it needs to manage the quantum of aperiodic server.

5.1.4 Synchronizarion Management A user can select a suitable lock protocol for their applications using a lock attribute. The locking cost for a lock variable is de ned as follows. 18

Kernelized Monitor (s) 100 40 N.A.y1 40

Lock syscall Lock Acquired Lock Not Acquired Unlock

Basic Locking () 100 36 120 36=144 y3

Priority Inheritance () 100 88 148 + m 2 12 y2 72=180 + n 2 12 y3 y4

y1 No one can preempt. y2 m is a number of chains of chained locks. y3 The cost of right parts includes the cost to wakeup the thread waiting a lock. y4 n is a number of nest of nested locks. Table 5: The Overhead of Locking Primitives

C

lock

=C

lock sys int

+C

lock acquired

+C

lock wait

C is the trap and argument copying time. C is the initialization time for the lock variable. C is the total execution time for block and wakeup. Table 5 compares the three locking protocols; basic default, kernelized monitor, and priority inheritance. The priority inheritance protocol takes more time than the kernelized monitor in . This additional costs are due for keeping track the relation between all costs except C locks and the threads of its owner.. The schedulability analysis formula shown in Section 3.3 indicates that the schedulability of priority inheritance is better than kernelized monitor. Because of the basic overhead of priority inheritance lock, we need to determine the range where kernelized monitor is suitable. When using kernelized monitors, it is impossible to create a period thread whose period is shorter than the size of critical section. However, if the length of the critical section is reasonably short, a kernelized monitor can be a better choice because its cost is lower than the priority inheritance lock. Let us calculate the overhead of both locking policies, and determine which protocol is better under what size of critical region. Here, C is the time to execute a pair of lock and unlock. lock sys int

lock acquired

lock wait

lock sys int

nullcs

C C C C

km

+C +C =C +C +C (1) = C = 12s ( )=C +C ( ) = (n 0 1) 2 (C

nullcs pi

(1)

nullcs km

km acquired

km unlock

pi

pi

bpi

syscall

acquired

unlock

= 180s = 260s

itds csw check

wait n bpi

km syscall

sched

wait n

lock not acquire

+C

wakeup

) = (n 0 1) 2 588s

We introduce the overhead in the worst case when a lock is shared by n thread in the following is the overhead for context switching of lock i when the thread executes unlock way, where C primitive. Then, the overhead of each lock are: i delay

C C

km bpi

=C )=C

( )

nullcs n

(

nullcs n

km lock bpi lock

+C +C

km

= 192s ) = 260 + (n 0 1) 2 588s

( )

wait n

(

wait n

Kernelized monitor checks that the runnable thread which has higher priority by calling itds csw check. However, when the priority inheritance lock is used, and n higher threads execute the critical region simultaneously, only one thread can enter the critical region, but 19

Thread Thread A Thread B Thread C

Period (ms) 100 200 400

Worst Case Exec (ms) 20 50 60

Start Time (ms) 20 10 0

Utilization (%) 10 50 10

Priority speci ed by user 2 1 3

Table 6: Parameters of Threads in Benchmark 1 the remaining n - 1 threads wait for releasing the lock. Then n - 1 times preemption may be occurred in the worst case. The di erence of the overhead of two locking protocols is shown as

C

pi

( )

nullcs n

0C

km

( )

nullcs n

= 80 + (n 0 1) 2 588s

The result shows that if the length of critical section is shorter than 80 + (n 0 1) 2 588s, kernelized monitor is better than the priority inheritance lock in RT-Mach. 5.2

Benchmark and Schedulability Analysis

This section presents four benchmarks for evaluating the e ect of priority inversions and aperiodic server algorithms. The rst benchmark compares the schedulability among di erent locking protocols under a priority inversion. The second one compares the worst case response time of aperiodic threads under di erent aperiodic servers. In the following sections, we use the some terminologies: C , B , T , U and W CRT . C is the worst case execution time of thread i, B is the worst case blocking time of thread i, T is the period of thread i, and U is a processor utilization of thread i. Also, W CRT is the worst case response time of aperiodic thread i. i

i

i

i

i

i

i

i

i

i

5.2.1 The E ect of Priority Inversion This benchmark shows a simple case where user de ned ad-hoc priority assignment and FIFO queueing of locks cause priority inversion and degrade schedulability. There are three periodic threads, T h , T h and T h which share the same object. At the beginning of the execution, each thread locks the shared object and releases the object at the end. The timing attributes of each thread are given in Table 6. We evaluated the three test cases where each case uses a di erent scheduling policy. The rst case uses round-robin scheduling and FIFO queueing for the lock object. The second case uses xed priority scheduling and priority-based queueing. The priority was assigned by a user as speci ed. Note that this priority assignment is not equivalent to the rate monotonic case. The third case uses rate monotonic scheduling and priority-based queueing. In both test 1 and test 2 cases, the worst case blocking time of T h becomes B = C + C , because T h may be blocked by both T h and T h due to priority inversion. However, in test 3, T h 's worst case blocking time is bound by max(C , C ), since the priority-based queueing avoid all priority inversion cases. By using the exact characterization formula shown in Section 3.3, we can verify that test 1 and test 2 cases cannot be schedulable, but test 3 can be schedulable. Also, we executed this benchmark program on RT-Mach, then T h sometimes missed its deadline in test 1 and test 2, but all deadlines was satis ed under test 3. This result also A

B

C

A

B

A

B

C

C

A

A

B

C

A

20

Thread Thread A Thread B Thread C Thread D

Ti (ms) 100 400 300 1000

Ci (ms) 10 60 EXE 30

Start Time (ms) 20 10 10 0

Utilization (%) 10 15 EXE 300 3

Table 7: Parameters of Threads in Benchmark 2 indicated that priority inversion was caused by FIFO queueing or ad-hoc priority assignment. Thus, the systematic priority assignment and priority-based queuing are important for avoiding priority inversion.

5.2.2 Schedulability Analysis under Priority Inversion The second benchmark compares the schedulability of periodic threads under di erent locking protocols, priority locking, priority inheritance protocol, and kernelized monitor. The benchmark shows that priority queue is not sucient. There are four periodic threads: T h , T h , T h and T h . T h has the highest priority, T h and T h are medium, and T h is the lowest priority. T h is executed rst, T h and T h are executed next. Lastly, T h is executed. T h and T h share the same object. At the beginning of the execution, each thread locks the shared object and releases the object at the end. The timing attributes of each thread are given in Table 7. In the benchmark, for each locking protocol, we measured the break down processor utilization while we vary the execution time of T h (i.e., EXE in Table 7) Now, let us analyze the schedulability of the given threads. From the exact characterization analysis formula shown in Section 3.3, we calculated the worst case blocking time of each thread. In the case of priority locking, we can derive B = C + C + C = 100ms, B = B = 30ms, B = B = 30ms, and B = 0ms under the worst case blocking case. Due to priority inversion, T h was blocked until T h , T h , and T h complete. In contrast, we can get B = B = B = C under both priority inheritance protocol and the kernelized monitor, because T h inherits the priority of T h , and T h and T h cannot preempt T h . We calculated the maximum execution time of EXE using the exact characterization formula. The result became EXE = 180ms under priority inheritance and kernelized monitor, and EXE = 0ms under priority locking. Also, the measured results was EXE = 170ms under priority inheritance and kernelized monitor and EXE = 10ms under priority locking. Table 8 shows the measured results of the benchmark 2. The results demonstrate that the priority inversion degrades the schedulability signi cantly. Next, let us analyze the di erence between the predicted results and the measured results. Under priority inheritance and kernelized monitor, the measured results and the predicted results di er about 10 ms. This di erence was caused by the clock interrupts, the reincarnation cost and the preemption cost. If we assume EXE = 170, the result can be recalculated as follows. Here, the clock interrupts occurred in every 10 ms, and T h preempted T h twice per its period. A

D

A

B

D

B

C

B

C

D

C

A

A

D

C

A

C

D

B

C

D

B

D

D

A

B

A

C

B

C

D

D

D

A

B

A

A

EXE 0 = 170 + C

clockint

2 17 + C

preempt

22+C 21

reincar

C

= 178:6

C

Scheduling Policy Rate Monotonic Rate Monotonic Rate Monotonic

Lock Policy Priority Queue Basic Priority Inheritance Kernelized Monitor

Max Exec Time (ms) 10 170 170

Utilization (%) 31 85 85

Table 8: The Result of Benchmark 2 Thread Thread A Thread B

Period (ms) 100 200

Worst Case Exec (ms) 20 EXE

Start Time (ms) 10 0

Utilization (%) 30 EXE 200

Table 9: Parameters of Threads in Benchmark 3 This adjusted result is very close to the predicted results, 180. From this, we could predict the actual system behavior from the schedulability analysis. Under priority locking, measured result was EXE = 10 ms, but the predicted result was 0 ms. This result was caused by the di erence of start time of T h and T h , T h . T h was executed 10 ms before T h and T h . Then, actual blocking time became C + C 0 10ms. We needed to compensate the predicted result to 10 ms. This leads that two results were equivalent. The third benchmark shows the di erence of schedulability between priority inheritance and kernelized monitor. In the test, we consider two threads, T h and T h . T h is the highest priority thread, and it is started 10 ms after T h starts. The timing attributes are given in Table 9. T h has the critical section, and we change the length of the critical section. In the priority inheritance case, we can calculate the schedulability in the following ways by the same discussion for benchmark 2. The di erence is that we do not need the worst case blocking time when priority inheritance is used because the critical region is not shared with other thread, but we must consider the worst case blocking time when kernelized monitor is used, because it blocks all other threads during the critical section. Let us compare the results from our analysis and the actual system. Under the priority inheritance protocol, B became 0. Considering just T h , we could derive AA = 0:3 < 1 from the formula. Next, we considered both T h and T h . The result is EXE = 160 ms from the exact characterization formula under priority inheritance, and EXE = 80 ms under kernelized monitor. The measured results from the benchmark is shown in Table 10. By the same discussion in benchmark 2, we can get compensated results between measured and predicted result. The results indicate that basic priority inheritance protocol achieves higher schedulability than kernelized monitor. If the length of critical section becomes long, we conclude that priority inheritance protocol is better than kernelized monitor from this result and the results from Section 5.1.4. A

A

B

C

C

B

A

B

A

C

A

B

B

C

A

A

A

T

B

5.2.3 Response Time Analysis for Aperiodic Threads The benchmark consists of two threads. First thread is a periodic thread, and second thread is an aperiodic thread. We change the quantum of aperiodic server, and measured the response 22

Scheduling Policy Rate Monotonic Rate Monotonic

Lock Policy Priority Inheritance Kernelized Monitor

Max Exec (ms) 152 83

Utilization (%) 96 71

Table 10: The Result of Benchmark 3 Thread Thread A Thread B Thread A Thread B

Period (ms) 500 100 500 200

Worst Case Exec (ms) 300 1 300 1

Start Time (ms) 0 30 0 30

Utilization (%) 60 3.3 60 3.3

Table 11: Parameters of Threads in Benchmark 4 time of aperiodic thread. Thread B becomes runnable 30 ms later after the thread is started. Table 11 shows the thread set of the benchmark. Rate monotonic algorithm can schedule aperiodic threads using aperiodic servers. The benchmark considers the response time of thread B where respective aperiodic server is used. The following formula show the predictable response time using respective aperiodic servers, where AT is a arrival time of thread i and REM (e) is a remaining time of thread i at the time when event e is occurred. arrived. i

i

W CRT W CRT W CRT

= REM (AT ) + W CET = 271ms = round(AT ; T ) = W CET = 1ms

background P olling

Def errable

B

A

A

A

background

A

Background server delays the execution thread B until thread A is completed. Polling server can allow thread B to execute at every polling interval. Deferrable server can allow the thread B to execute as soon as the thread becomes runnable. The result is shown in Figure 6. We measured the response time when we change the quantum of aperiodic server. The predicted results and measured results are very close. 6

Related Work

Advantages of using a micro kernel instead of a standard monolithic kernel is its high preemptability, small size, and extensibility. However, only a few micro kernels were designed for supporting predictable distributed real-time computing environment. For instance, Chorus's micro kernel[24] was designed for real-time applications. However, their emphasis was placed at rather low-level kernel functions such as providing a user-de ned interrupt handler and preemptive kernel. The kernel uses the wired-in xed priority preemptive scheduling policy and there is no additional features reported to avoid priority inversion problems. The V kernel's emphasis was also intended for supporting high speed real-time applications[5]. V's optimized message passing mechanism and VMTP protocol can provide basic functions for 23

WCRT (msec)

450.0 Backgroud Polling Server Deferrable Server

400.0 350.0 300.0 250.0 200.0 150.0 100.0 50.0 0.0 100

150

200

250 300 350 400 Quantum of Aperiodic Server (msec)

Figure 6: Response Time Analysis for Aperiodic Threads building distributed real-time applications. However, the wired-in scheduling policy and locking protocol may cause us a potential inversion problem. Amoeba's advantage is its high performance RPC and was used for remote video image transmission using Ethernet. Like RT-Mach, Amoeba can support a set of single board computers without having local disks, however, it does not provide any safe mechanisms for creating a periodic thread and avoiding priority inversion problems. Yet another uniqueness of RT-Mach is that a user can use a set of real-time toolset, Scheduler 1-2-3 and ARM, which allows us to analyze, simulate, and monitor the real-time task set[28, 29, 30]. The proposed real-time thread model is di erent from many other thread models. In particular, 1) our model distinguishes between real-time and non real-time threads, 2) we provide explicit timing constraints for each real-time thread, 3) a scheduling policy for a real-time application can be selected, and 4) various locking protocols are provided to avoid unbounded priority inversion. The POSIX-Thread proposal[16] is very similar to Mach's C-Thread package[7] and it also does not distinguish between real-time threads and non-real-time threads. This poses a problem of identifying the type of threads that has a hard deadline or can pinned down its memory objects. However, it can dynamically select the thread scheduling policy and a thread also contains the thread attributes such as inherit priority, scheduling priority, scheduling policy, and minimum stack size. Thus, adding our timing attributes into the proposed POSIX interface would be very simple. The Ultrix-Thread model[6] does not address real-time thread issues, however, the designer intended to create much lighter threads by leaving the context information of thread at the process level as much as possible. Thus, creation of a new thread can be done by specifying thread's stack page and guard page address: tfork( stack ptr, guard ptr ). The Topaz-Thread model[19]provides a clean thread interface library at the Modula-2+ language level, however, it does not address real-time thread issues. The notion of policy/mechanism separation was rst introduced in the development of the 24

Hydra operating system at CMU [14]. Hydra implemented a scheduler which consult a user-level policy modules. In the Intel 432 system, the scheduling policy was implemented by the software while the dispatching mechanism was realized by the hardware[15]. Unlike previous attempts, RT-Mach demonstrated that wide range of processor scheduling policy can be implemented using our class of scheduling policy object. RT-Mach currently supports round robin, xed priority, earliest deadline rst, rate monotonic, rate monotonic with polling server, rate monotonic with deferrable server, and other policies like deadline monotonic and sporadic server algorithms can be easily added. In the original Mach kernel, yet another policy/mechanism separation has been adapted for providing a exible processor set allocation scheme[3]. The kernel provides processor allocation mechanisms and a CPU server implements an allocation policy. Our scheduling policy can be selected independent form the processor set allocation policy. Use of di erent types of locking protocols, such as a static and dynamic priority ceiling protocol is well analyzed in real-time computing eld. In RT-Mach, we have implemented the static PCP protocol for the lock variables, since it is not easy to support a pure deadline-driven scheduling policy for distributed applications. At this point, no media access protocol can contend a shared media based on such deadline information. 7

Conclusion and Further Work

We reported that using new real-time thread, synchronization, and scheduler in Real-Time Mach, a user could eliminate unbounded priority inversion problems and could perform schedulability analysis for real-time programs in a single CPU environment. The explicit use of timing constraint in de ning a real-time thread was very e ective to perform a schedulability analysis. Yet, the system interface for creating periodic and aperiodic threads was natural for de ning various real-time activities. Policy/mechanism separation in real-time scheduler was the key issue for many real-time system designers. We demonstrated that many scheduling policies like round robin, xed priority, earliest deadline rst, rate monotonic, rate monotonic with polling server, and rate monotonic with deferrable server policies has been implemented using our scheduling policy object. We also demonstrated that various locking protocols are e ective to eliminate unexpected delay and unbounded priority inversion problems. The Real-Time Mach has been operating satisfactorily for a network of SUN, SONY, a laptop, and a single board computers. So far, our focus was to eliminate various types of unbounded priority inversions and unpredictable runtime behavior among real-time programs in a single CPU environment. We must go on to extend our facility to support distributed real-time applications. Our plan is to remodel the Mach IPC feature so that it can support a priority-based message handling and di erent types of transport protocols for real-time message transmission. The basic issues has been discussed and evaluated in our experimental tested [31, 32]. For instance, a priority inversion may occur when a client requests for a service to a non-preemptive server. A higher-priority client may face an unbounded priority inversion problem at the server. To avoid this problem, one solution is to add a port attribute for each port and allow us to set a di erent priority inheritance policy, like the lock attribute we implemented. Real-time network support is also important and we are planning to create a new netserver facility where protocol processing will be performed by multiple worker threads based on a message priority. 25

References

[1] M.J. Accetta, W. Baron, R.V. Bolosky, D.B. Golub, R.F. Rashid, A. Tevanian, and M.W. Young, \Mach: A new kernel foundation for unix development", In Proceedings of the Summer Usenix Conference,, July, 1986.  [2] Ozalp Babaoglu, \Fault-Tolerant Computing Based on Mach", In Workshop, October, 1990.

Proceedings of USENIX Mach

[3] David L. Black, \Scheduling support for concurrency and parallelism in the Mach operating system", IEEE Computer, Vol.23, No.5, 1990 [4] R. Chen and T.P. Ng, \Building a Fault-Tolerant System Based on Mach", In USENIX Mach Workshop, October, 1990.

Proceedings of

[5] D.R. Cheriton, G.R. Whitehead and E.D. Sznyter, \Binary emulation of UNIX using V Kernel", In proceedings of Summer Usenix Conference, June, 1990. [6] D. S Conde, F. S. Hsu, and U. Sinkewicz, \Ultrix threads", In Conference, June, 1989.

Proceedings of Summer Usenix

[7] E. C. Cooper, and R. P. Draves, \C threads", Technical report, Computer Science Department, Carnegie Mellon University, CMU-CS-88-154, March, 1987. [8] J. Epstein and M. Shugerman, \A Trusted X Window System Server for Trusted Mach", In Proceedings of USENIX Mach Workshop, October, 1990. [9] A. Goldberg, A. Gopal, K. Li, R. Storm, and D. Bacon, \Transparent Recovery of Mach Applications ", In Proceedings of USENIX Mach Workshop, October, 1990. [10] D. Golub, R. Dean, A. Forin, and R. Rashid, \Unix as an application program", of Summer Usenix Conference, June, 1990. [11] G.J. Henry, \The Fair Share Scheduler", 8, October, 1984.

In the proceedings

AT&T Bell Laboratories Technical Journal, Vol. 63, No.

[12] Mark Heuser, \An implementation of real-time thread synchronization", Summer Conference, June, 1990.

In Proceedings of Usenix

[13] P. Hood and V. Grover, \Designing real time systems in ADA", Tech Report 1123-1, SofTech, Inc., January, 1986. [14] R. Levin, E. Cohen, W. Corwin, F. Pollack, and W. Wulf, \Policy/Mechanism Separation in HYDRA", In Proceedings of 5th Sympo. on Operating Systems Principles, November, 1975. [15] E.I. Organick, \A Programmer's View of the Intel 432 System",

McGraw-Hill, 1983.

[16] IEEE, \Realtime Extension for Portable Operating Systems", P1003.4/Draft6, February, 1989. [17] J. P. Lehoczky, L. Sha, and Y. Ding, \The rate-monotonic scheduling algorithm: Exact characterization and average case behavior", Department of Statistic, Carnegie Mellon University, 1987. [18] C. L. Liu and J. W. Layland, \Scheduling algorithms for multiprogramming in a hard real time environment", Journal of the ACM, Vol.20, No.1, 1973. [19] P. McJones and Swart P, \Evolving the unix system interface to support multithreaded programs" , Technical report, Tech Report 21, Part I, DEC SRC, September, 1987.

26

[20] A. K. Mok, \Fundamental Design Problems of Distributed Systems for the Hard-Real-Time Environment", PhD thesis, Massachusetts Institute of Technology, May 1983. [21] S.J. Mullender, G.V. Rossum, A.S. Tanenbaum, R. Renesse and H. Staveren, \Amoeba: A Distributed Operating System for the 1990s", IEEE Computer Vol.23, No.5, May, 1990 [22] R. Rajkumar, \Task Synchronization in Real-Time Systems", Ph.D. Dissertation, Carnegie Mellon University, August, 1989. [23] F. Ross, \FDDI - A Tutorial",

IEEE Communication Magazine, May, 1986.

[24] M. Rozier, V. Abrossimov, F. Armand, I. Boule, M. Gien, M. Guillemount, F. Herrmann, C. Kaiser, S. Langlois, P. Leonard, and W. Neuhauser, \Chorus distributed operating system", Computing Systems Journal, The Usenix Association, December, 1988 [25] L. Sha, R. Rajkumar, and J. P. Lehoczky, \Priority inheritance protocols: An approach to real-time synchronization", Technical Report CMU-CS-87-181, Carnegie Mellon University, November 1987 [26] B. Sprunt, L. sha and J. P.Lehoczky, \Aperiodic Task Scheduling for Hard-Real-Time Systems", The Journal of Real-Time Systems, Vol.1, No.1, 1989. [27] J. A. Stankovic, \Misconceptions about real-time computing: A serious problem for next-generation systems", IEEE Computer, Vol.21, No.10, October, 1988. [28] H. Tokuda, M. Kotera, and C. W. Mercer, \A real-time monitor for a distributed real-time operating system", In Proceedings of ACM SIGOPS and SIGPLAN workshop on parallel and distributed debugging, May, 1988. [29] H. Tokuda and M. Kotera, \Scheduler1-2-3: An interactive schedulability analyzer for real-time systems", In Proceedings of Compsac88, October 1988. [30] H. Tokuda and M. Kotera, \A real-time tool set for the ARTS kernel", Proceedings of 9th IEEE Real-Time Systems Symposium, December, 1988. [31] H. Tokuda and C. W. Mercer, \ARTS: A distributed real-time kernel", Review, Vol.23, No.3, July, 1989.

ACM Operating Systems

[32] H. Tokuda, C. W. Mercer, Y. Ishikawa, and T. E. Marchok, \Priority inversions in real-time communication", In Proceedings of 10th IEEE Real-Time Systems Symposium, December, 1989. [33] H. Tokuda, T. Nakajima, and P. Rao, \Real-Time Mach: Towards a Predictable Real-Time System", In Proceedings of USENIX Mach Workshop, October, 1990.

27

Suggest Documents