Efficient Timing Management for User-Level Real-Time Threads

Efficient Timing Management for User-Level Real-Time Threads Shuichi Oikaway

Hideyuki Tokuday z

[email protected]

[email protected]

y

Faculty of Environmental Information, Keio University z School of Computer Science, Carnegie Mellon University Abstract Timing management for user-level real-time threads can be done with appropriate support of the kernel. When a specified time comes, the kernel makes a virtual processor to upcall a user-level scheduler for its timing management. Then, the timing management can suffer from the overhead of a user-level scheduler. This paper presents an efficient timing management mechanism for user-level real-time threads. By sharing user-level timers through the shared kernel/user structure and keeping the appropriate hints for them, redundant processing of them and events can be eliminated. The results of the performance evaluations show that the upcall performance of our user-level real-time threads is comparable to and more stable than that of kernel-provided real-time threads.

1

Introduction

User-level real-time threads have been developed to support continuous media applications [7]. Such applications require more efficient and flexible support from real-time threads than advanced real-time systems [9, 10]. Threads are scheduling entities to which the scheduler allocates CPU resource. Threads are separated from tasks or processes which are entities to which the kernel allocates other resources such as a virtual address space, memory regions, and so on. A number of threads can run in a single task and share its resources. User-level threads are threads which are implemented at the user level. They are created and scheduled by a user-level scheduler (ULS) which resides at the user level. There must be at least one kernel-provide thread which execute user-level threads. Real-time threads are threads with timeliness requirements. Such requirements are specified in thread attributes, which include timing attributes such as deadline and periodic instantiation information. Since timing is generally managed by the kernel, real-time threads have been realized as kernel-provided threads. Provision of real-time threads at the user level enables flexible structuring of applications. For example, they can

make dynamic rebinding of thread attributes fast and support of multiple thread models easy. It also enables their efficient implementation since their creation, context switching, and termination can be performed at the user level without the kernel interventions. Timing management for user-level real-time threads can be done at the user level through appropriate support of the kernel. When a specified time comes, the clock interrupt handler notifies a ULS. Then, ULS processes sleeping userlevel real-time threads and user-level timers. Such timing management can suffer from the overhead of the ULS. In this paper, we present a mechanism which reduces the cost of such timing management. We implemented that mechanism in RT-Mach real-time microkernel [10] and a user-level real-time threads library, called RTC-Threads. The results of some benchmarks show that the performance of RTC-Threads becomes comparable to that of kernelprovided threads even when the execution of a user-level real-time thread includes a kernel intervention.

2

Previous Work

When a user-level thread blocks in the kernel, care must be taken not to block other runnable threads belonging to the same task as the blocked thread. First-class user-level threads were developed towards this end; a few examples include Scheduler Activations [1] and first-class user-level threads [6] in the Psyche operating system. Both of them are implemented on the parallel computers to exploit the ability of parallelism of the underling hardware. Since parallel applications run mainly at the user level, upcalling a ULS does not occur as often as for user-level real-time threads.1 It means that the performance of upcall and event processing is not very important for such applications. The performance of upcall in the implementation of Scheduler Activations is actually considerably slower than that of kernel-provided threads. Split-level scheduling [4] provides user-level real-time threads by shared kernel/user structures used by split 1 This paper uses “upcall” to mean that the kernel calls a software module at the user level. In this case, the execution mode of the program is switched to the user mode. It is called “software interrupt” in Psyche.

kernel-level and user-level schedulers. This is implemented on a uniprocessor, and shared memory is used extensively to pass information between a ULS and the kernel. Each user-level thread has its logical arrival time and deadline, and the threads are scheduled by the deadline/workahead scheduling policy based on their timing attributes. Split-level scheduling was developed to handle continuous-media efficiently. Timing management is done by the kernel-level and user-level schedulers cooperatively using threads’ attributes. It, however, does not provide a fast timing management mechanism like ours. When several sleeping threads in a single task woke up at the same time, a ULS first makes all of them runnable; then it select a thread with highest priority and invokes it. The mechanism proposed in this paper makes it possible to invoke a thread with highest priority without being disturbed by lower priority threads. The close cooperation of the kernel and a ULS can eliminate duplicated processing of timers and events which is found in split-level scheduling.

3

Software Architecture

In this section, we first show the software architecture of user-level real-time thread. Then, we describe their detailed components. Since we intend to support multiple thread models, we tried to make our architecture and mechanism independent from the existing software platform. Figure 1 shows the overall architecture and the elements which compose user-level real-time threads. In a task, there can be multiple user-level threads and user-level timers. User-level timers may be attached to a user-level thread to control its start time, period and deadline. User-level threads and timers are managed by a ULS which resides in each task. The major components which are significant to timing management, such as virtual processors, the kernel/user shared memory, and user-level timers, are described below.

3.1

Virtual Processor

User-level threads are multiplexed on a single kernelprovided thread, which is called a virtual processor (VP). There are the following three types of VP:

A current VP is currently executing user-level threads in an address space. Only this type of virtual processors can run at user-level. A kernel VP is attached to a specific user-level thread, which is blocked in the kernel. It executes only in the kernel because user-level threads running at user-level must be multiplexed on the current virtual processor. Reserved VPs are waiting to become a current one. When a ULS is initializing its status, it asks the kernel to create them. One of them is used when a current one is blocked in the kernel. They are introduced to

Task Thread

Thread

User-Level Timer

AA AAAAAA A AAAAAAAA AA AA AAAAAA AA AA AAAAAA A

ULS (User-Level Scheduler)

Kernel/User Shared Memory

Kernel Threads

CurrentVP

ReservedVP

Kernel Timer

Alarm Clock

Figure 1: Overall Architecture avoid the unpredictable delay of the dynamic creation of a new virtual processor. When a user-level thread is blocked in the kernel, the current VP, which is executing the blocked thread, becomes a kernel VP. Then, one of reserved VPs is taken from the list, and becomes the current VP. Finally, the new current VP upcalls the ULS. When a kernel VP is unblocked, it is scheduled by the kernel-level scheduler independently of the current VP. When a kernel VP is about to exit the kernel, it passes two execution contexts to the ULS. One is for the userlevel thread on the kernel VP. Another is for the user-level thread on the current VP, which is preempted by the kernel VP. Then, the current VP is linked in the list of reserved VPs, and the kernel VP becomes the new current VP. Finally, the new current VP upcalls the ULS.

3.2

Kernel/User Shared Memory

The kernel and a ULS share a data structure on a kernel/user shared memory region. The structure contains information which needs to be communicated between the kernel and a ULS. It contains an event buffer, current userlevel thread information, user-level timer data structures, and lock variables. The shared memory provides a simple way to pass multiple events asynchronously. This means that notifying events is separated from upcalls. This feature is important especially for user-level real-time threads because there are many cases where events should be notified without upcalling. For example, when a timer wakes up and the priority attached to the timer is not the highest, the kernel just notifies a ULS but does not upcall. The current user-level thread information and the data of user-level timers are used by the kernel to know their status and attributes. Placing them in the structure enables the kernel to obtain information which is necessary to control VPs and the kernel-provided timer, which is used as an alarm clock.

3.3

User-Level Timer

User-level real-time threads have timing attributes such as a start time, a period and a deadline. User-level timers are employed by user-level real-time threads for conducting their timing constraints. Timers are entities with three properties, an expiration time, a synchronization action to be taken, and an activity state [8]. By using timers, the timing management is separated from the thread management. The next section describes how a ULS is interacted with the kernel to manage user-level timers.

4

Efficient Timing Management Mechanism

Efficient timing management for user-level real-time threads is achieved by the close cooperation of the kernel and a ULS via the shared kernel/user shared structure. First, the part of the mechanism in the kernel is described. Then, that in a ULS is described.

4.1

User-Level Timer Support in the Kernel

In our architecture, user-level timers are multiplexed on a single kernel-provided timer. The kernel-provided timer functions as an alarm clock for a ULS. It obtains necessary timing data of user-level timers via the shared kernel/user data structure, and sends events when necessary. The following sections describe how user-level timers are processed and when a ULS is upcalled. 4.1.1

time

User-Level Timers Current: 8:55

8:50

9:00

9:00

timer_hint_upcall

9:10

9:15

timer_hint

Figure 2: Hints to Manage User-Level Timers 4.1.2 Thread Wakeup by Timer When a user-level timer expires, the kernel needs to upcall a ULS and to process it (1) if there is no current VP in the task, or (2) if the priority of the expired user-level timer is greater than the current VP. If expired user-level timers do not have higher priorities than the current VP, the kernel just updates the two hints variables. In the first case, since there is no current VP which executes the ULS and processes the expired user-level timer, the kernel takes one from the reserved VPs and makes it the current VP. Then, the kernel makes the new current VP to upcall the ULS. In the second case, the current user-level thread must be preempted, and the expired user-level timer must be processed. If the current VP can be used to upcall the ULS, the execution context of preempted user-level thread is passed to the ULS and it is upcalled. If the current VP is executing inside the kernel, the current VP cannot be used. Then, one from the reserved VPs becomes the new current VP and upcalls the ULS.

Processing User-Level Timers

The roles of a kernel-provided timer which is employed by a ULS are (1) to examine user-level timers, (2) to update two hints variables, such as timer hint upcall and timer hint, and (3) to upcall the ULS when necessary. The two hints variables have the following meanings: timer hint upcall: points to a user-level timer of which time is set to the kernel-provided timer. That user-level timer has the highest priority among userlevel timers which expire at the same time. timer hint: caches a user-level timer from which the kernel-provided timer starts examining userlevel timers when it expires. The user-level timers are examined to find one for the next value of timer hint upcall. Then, the next one is cached in timer hint to reduce the number of user-level timers which to be examined the next time. Figure 2 depicts an example of their usage. Assuming the current time is 8:55, the kernel-provided timer expires at 9:00 since timer hint upcall points to a user-level timer which expires at 9:00. There is a user-level timer which expires at 9:10 which is the next to the ones which expire at 9:00, so that timer hint points to the one which expires at 9:10.

4.2

Timer Management in ULS

There are the following three cases in which a ULS processes user-level timers differently; (1) When a ULS is upcalled because of timer expiration, it processes a designated user-level timer. (2) When a user-level timer expires without upcalling, a ULS needs to process it. (3) When a user-level timer is enqueued, the two hints variables and the kernel-provided timer for a task need to be updated if necessary. In these cases, how a ULS should manage user-level timers are described below. 4.2.1 Efficient Event Processing in ULS As a part of interfaces to VPs, a ULS of user-level real-time threads must handle upcalls of VPs from the kernel. When an event which changes the current scheduling decision of a ULS occurs, the kernel needs to upcall the ULS for rescheduling. An upcalling VP invokes the ULS, and the event is processed. Then, the ULS calls its scheduling routine and transfers the execution to the user-level thread which has the highest priority among runnable user-level threads. As described above, an upcalling VP must be pass through a ULS to invoke a high priority user-level real-

time thread. Then, the low overhead of a ULS must be achieved for an efficient timing management mechanism. Let us consider the case when a periodic user-level thread wakes up. A typical procedure for a ULS to process that event is the following: 1. takes an event from the buffer if there is any. (a) processes the event. (b) puts a user-level thread, which is provided as an argument of the event, into the run queue to make it runnable. 2. processes user-level timers if there was a timer event. 3. invokes a scheduler and finds the next user-level thread to execute. 4. switches the context to the next user-level thread. Although the above procedure is typical, it contains redundant parts to process events. The reason why a ULS is upcalled is to tell the ULS that there is a runnable userlevel thread which has higher priority than the current one. The kernel provides such user-level threads as the argument of an event. The kernel also provides the user-level timer which has the highest priority among expired timers. Thus, the ULS does not need to process all user-level timers nor to invoke the scheduler to find the next user-level thread to execute. Therefore, the entry point function of a ULS for upcalls can process events in the following way: 1. takes an event from the buffer until there is any. (a) processes the event. (b) compares and keeps the user-level thread which has the highest priority. (c) puts the other one into the run queue. 2. switches the context to the user-level thread which has the highest priority. This procedure works efficiently especially for the case that there are a few events. Since there are only few cases that there are many events to upcall, such case is the most typical one. 4.2.2

Processing Expired User-Level Timers

The straight processing of expired user-level timers happens only when an expired user-level timer contains higher priority than the current user-level thread. Only the high priority one is processed, but the other expired user-level timers remain unprocessed. Such expired user-level timers need to be processed later. For example, when a user-level timer expired but a ULS was not upcalled, only the two hint variables are updated. The ULS is not upcalled since the current user-level thread has higher priority than expired user-level timers. Thus, when the current user-level thread stops its execution, those hints variables need to be examined whether there is any expired user-level timer. If there is any, it is processed at that time.

expired timers

timers which expire at next upcall

User-Level Timers (a)

(b)

(c)

(d)

timer_hint_upcall

(e)

(f)

(g)

timer_hint

Figure 3: Timing Management for User-Level Timers 4.2.3 Enqueuing a User-Level Timer When a new timer is enqueued in the active timer queue, a ULS sometimes needs to tell the kernel when it would like to be notified. In Figure 3, there are seven user-level timers. timer_hint_upcall points to timer (c), and timer_hint points to timer (e). Then, depending the expiration time of a new timer, there are several separate conditions which have to be managed as follows:2

T (new) < T (c) : It will be enqueued before timer (c). Then, a ULS asks the kernel to change the time when it will be notified, and timer_hint_upcall is updated. T (new) = T (c) & P (c) < P (new) : It will be enqueued before timer (c). Then, timer_hint_upcall is updated to the new timer. T (new) = T (c) & P (new) P (c) : It will be enqueued between timer (c) and (e). T (c) < T (new) < T (e) : It will be enqueued before timer (e). Then, timer_hint is updated to the new timer. T (e) T (new) : It will be enqueued after timer (e). Since timer_hint_upcall and timer_hint are allocated in the shared kernel/user data structure, the kernel can use their value at any time when they are necessary.

5

Implementation

We implemented user-level real-time threads in RTMach MK83.3 The kernel part of the proposed mechanism was implemented in RT-Mach, and the user-level part was implemented in a user-level real-time threads library, called RTC-Threads. The modifications made to the kernel to implement the mechanism for user-level real-time threads were quite small. About 1100 lines in the C language were added to the kernel source code including the modifications to support upcalls and user-level timers and the code of the some new primitives. The RTC-Threads library was implemented mostly in the C language. The machine independent part of the RTC2 T (a) stands for the expiration time of timer (a). P (a) stands for the priority of timer (a). 3 RT-Mach (Real-Time Mach) [10] is an extended version of Mach microkernel [5] to support real-time applications.

Table 2: Event Processing Costs

Table 1: Basic Performance Comparisons Signal/Wait Primitives Thread Creation Timer Creation Thread Resume by Timer Deadline Handler Invocation Get Thread Attribute Set Thread Attribute Thread Resume

RTC-Threads 10 sec 345 sec 4 sec 119 sec 154 sec 2 sec 2 sec 14 sec

RT Threads 131 sec 1612 sec 258 sec 98 sec 209 sec 81 sec 71 sec 117 sec

Threads library is made up of approximately 5000 lines in the C language. It includes the user-level scheduler, thread management functions, and timer management functions.

6

Evaluation

This section shows the performance of the current implementation of RTC-Threads. All benchmarks were performed on a Gateway2000 486DX2 66MHz system, and were measured by using a STAT! timer board. The timer board can take measurements accurate to 1 sec.

6.1

Basic Performance

Table 1 shows the performance comparisons between RTC-Threads and kernel-provided real-time threads (RT Threads).4 RTC-Threads outperforms RT Threads in most of the benchmarks. The functions used by the benchmarks are very common in real-time applications. Some of the benchmarks which can be performed totally at the user level are significantly faster for RTC-Threads than RT Threads. Thus, the overheads for real-time applications can be reduced by using RTC-Threads. Only resuming a thread by a timer costs more expensive for RTC-Threads. To measure the cost, we run the benchmark program in which there is only one periodic thread in a task. Then, we measured the latency between the time when the clock interrupt handler is called and the time when the thread starts to execute at the user level. The overheads in the kernel for RTC-Threads include checking the next user-level timer in the clock interrupt routine, setting up a new upcalling virtual processor. Although the upcall performance of RTC-Threads in this case is slower than that of RT Threads, the difference is quite small. We will show that it becomes better for RTC-Threads than RT Threads when there are several background threads in Section 6.3. 4 The numbers are different from those found in [7], since this version of RTC-Threads supports real-time extensions and the benchmarks were measured by using a timer board.

Event Handling Timer Handling Thread Select Thread Invoke Total

6.2

Previous Scheme 6 sec 44 sec 12 sec 8 sec 70 sec

Proposed Scheme 21 sec N/A N/A 5 sec 26 sec

Timing Management Overhead of ULS

The proposed mechanism was able to reduce the overheads of a ULS to process user-level timers and scheduling when it is upcalled. Table 2 shows the costs of the elements which are performed in a ULS when it is upcalled. The event handling for the proposed scheme costs more expensive than the previous scheme, since both events and timers are handled at the same time. Timer handling and thread selecting are skipped for the proposed scheme. Since the continuation passing style [3] to invoke a thread is taken for the implementation of the proposed scheme, thread invocation is also faster. The total cost for the proposed scheme is approximately 2.7 times as fast as for the previous scheme.

6.3

Spread of Upcall Costs

In terms of the upcall cost of real-time threads, the spread of the costs should be as small as possible to provide predictable thread behavior. Upcalls occur when a higher priority thread becomes runnable. If the spread of the upcall costs is wide, there is a possibility that a lower priority thread can run for a unpredictably long time while there is a runnable thread which has higher priority. If the spread of the upcall costs is narrow, threads behavior becomes stable. Then, system designers can construct more predictable and efficient systems if RTC-Threads can provide the predictable latency to invoke the highest priority thread. The benchmark program measures the cost of the upcall path for resuming a thread by a timer. It also runs several low priority threads to interfere with upcalling of the high priority thread. The number of such low priority threads is varied form 1 to 16 (1, 2, 4, 8, 12 and 16 were actually used). The period of the high priority thread is 1 second, and the period of the low priority threads is 100 msec. The total work load done by the low priority threads can be finished slightly less than 100 msec. That work load is divided to all of them. Thus, if there are two threads, each thread will finish the work within 50 msec. Figure 4 and Figure 5 show the spreads of RTC-Threads and RT Threads upcall costs respectively. The results show that the spread for RTC-Threads are narrower than RT Threads. The spread for RT Threads is getting worse if there are 4 or more low priority threads while the spread

300

250

c)

se (µ

se (µ

50

96 101 106 111 116 121 126 131

y nc te La

cy en

t La

100

Frequency

150

c)

142 147

12

0 16

8

Figure 4: Spread of RTC-Threads Upcall Costs

for RTC-Threads does not change very much for all the cases. The figures also show that the upcall cost becomes larger and larger as the number of low priority threads increases both for RTC-Threads and RT Threads. This is because there are more system overheads if there are more threads. The average cost is, however, smaller for RTCThreads than RT Threads.

7

250

Summary

User-level real-time threads require fast upcall processing for efficient timing management. Timing management at the user level can be done through appropriate support of the kernel. The clock interrupt handler notifies a ULS when a specified time comes. Such a notification causes a VP to upcall a ULS. Then, that upcall has to be handled efficiently since it is not a rare case for user-level real-time threads. This paper presented the efficient timing management mechanism for user-level real-time threads. User-level timers are shared by a ULS and the kernel. Then, the kernel can examine them to find when it should notify a ULS. The two hint variables make it possible to trace a user-level timer which has the highest priority. Therefore, a ULS can find a user-level timer which has to be processed immediately when upcalled. The evaluation of the implementation with the proposed mechanism showed that the upcall performance of our user-level real-time threads is comparable to that of kernelprovided real-time threads. The spared of upcall costs is even better for our user-level real-time threads.

150

100

50

136

141

146

151

ds Threa 1 er of Numb 4

300

200

200

89 94 99 104 109 114 119 124 132 137

AA AA AA AAA AA AA AAA AAA AAA AAA AAAA AAA AAA AA AAA AAAAA AAA AAA AAAA AA AA AAA A A AA AAA AAA AA AAA AAA AAA A AA AA AAAA AAA AA AA A

Frequency

AA AA AA AA AA AA AAAA AA AA AAAA AAA AAA AAA AA AAA AA AAAAAAA AAA

1

4

8

Numbe

12

a r of Thre

0 16

ds

Figure 5: Spread of RT Threads Upcall Costs

References [1] T. E. Anderson, B. N. Bershad, E. D. Lazowska, and H. M. Levy. Scheduler Activations: Effective Kernel Support for the User-Level Management of Parallelism. In Proceedings of the 13th Symposium on Operating System Principle, October 1991. [2] E. C. Cooper and R. P. Draves. C Threads. Technical Report CMU-CS-88-154, School of Computer Science, Carnegie Mellon University, February 1988. [3] R. P. Draves, B. N. Bershad, R. F. Rashid, and R. W. Dean. Using Continuations to Implement Thread Management and Communication in Operating System. In Proceedings of 13th ACM Symposium on Operating Systems Principles, October 1991. [4] R. Govindan and D. P. Anderson. Scheduling and IPC Mechanisms for Continuous Media. In Proceedings of the 13th Symposium on Operating System Principle, October 1991. [5] D. Golub, R. Dean, A. Forin, and R. Rashid. Unix as an Application Program. In Proceedingsof the Usenix Summer Conference, June 1990. [6] B. D. Marsh, M. L. Scott, T. J. LeBlanc, and E. P. Markatos. First-Class User-Level Threads. In Proceedings of the 13th Symposium on Operating System Principle, October 1991. [7] S. Oikawa and H. Tokuda. User-Level Real-Time Threads. In Proceedings of the 11th IEEE Workshop on Real-Time Operating Systems and Software, May 1994. [8] S. Savage and H. Tokuda. RT-Mach Timers: Exporting Time to the User. In Proceedings of the USENIX Mach 3rd Symposium, April 1993. [9] K. Schwan, H. Zhou and A. Gheith. Multiprocessor RealTime Threads. ACM Operating Systems Review, Vol. 26, No. 1, 1992. [10] H. Tokuda, T. Nakajima, and P. Rao. Real-Time Mach: Towards a Predictable Real-Time System. In Proceedings of USENIX Mach Workshop, October 1990.

Efficient Timing Management for User-Level Real-Time Threads

Efficient Timing Management for User-Level Real-Time Threads

Suggest Documents