Asynchronous Event Handling in Distributed Object-Based Systems * Sathis Menon College of Computing Georgia Institute of Technology
[email protected]
Partha Dasgupta Dept. of Computer Science Arizona State University
[email protected]
Abstract This paper discusses the design and the operating system support necessary for providing asynchronous event handling in distributed, passive object-based programming environments, where objects are potentially shared by disparate applications. We discuss the necessity of thread-based as well as object-based event notification and how a variety of hard to solve distributed programming issues can be tackled by using the approach outlined in the design. Keywords: [Distributed Programming Environments, Passive Objects, Threads, Concurrency, Events]
1. Introduction Distributed programming has always been recognized as an order of magnitude more complex than centralized programming, and novel system structuring paradigms hold significant promise in making this problem tractable. Much of the current research on distributed operating systems and the associated programming environments are concerned with simplifying the task of distributed programming by providing constructs that make these systems look and feel like centralized systems. To achieve this goal, various system structuring techniques have been tried. One such structuring involves distributed programming environments based on computation using objects (Eden [Almes 85], Argus [Liskov 87], Clouds v.1. [Spaf 86] and so on). Structuring such object-based systems using Distributed Shared Memory (DSM) is becoming a viable paradigm [Dasgupta 91]. This paper focuses on programming environments based on distributed objects and distributed shared memory. Despite the advances in the design of these DSM based programming environments, programming these systems remain overly complex. Since the complexities of the distributed environment are masked by the DSM layer, mechanisms such as signals that work in a distributed setting are not easily provided. Consider the UNIX operating system where the signal mechanism is often used to control processes (notify, terminate, etc.). In a distributed setting, where individual objects and threads span machine boundaries, a facility that mimics the signal mechanism is complex to achieve. Moreover, the interaction between multiple threads of control in an object and signals posted for the object have not been defined precisely.
* This work is funded in part by NSF grant CCR-8619886.
Richard J. LeBlanc, Jr. College of Computing Georgia Institute of Technology
[email protected]
Due to the autonomous nature of distributed computations, asynchronous event notification is even more important in distributed programming than in centralized programming. Also, since the activity in a distributed program spans a large domain of machines, the “unexpected” occurrences are far more probable than in centralized systems. Consider the problem of unlocking shared data items in the case of the abnormal termination of a distributed computation. Often, it is not even possible to know of all the locks the computation has acquired and therefore cleaning may be impossible, from a central vantage point. Facilities that raise and deliver asynchronous events to cooperating entities greatly simplify these tasks, in distributed applications. From an application’s performance perspective, an important distributed programming technique involves starting up multiple processes (or threads) to perform a task (concurrently) and then asynchronously notify each other of partial results obtained (unexpected discoveries, quicker heuristic searches, etc.) A generalized notification scheme is useful in implementing such algorithms. Thus we feel a complete, well designed event notification system is of great importance when considering the programmability of a distributed environment. However, distributed notification is an obviously difficult problem. Distributed applications are composed of many different abstract threads of execution. Each abstract thread of execution may span many different machines and its present location may be indeterminable. In addition, in DSM based systems, the current location of each data item may be at several different sites, concurrently. Hence the simple concept of “posting an event” raises serious questions of “post it to which thread, at which site, in which object,” and so on. In this paper, we propose a basic set of mechanisms based on distributed, asynchronous events. Events in our system can be handled on a per-thread basis or per-object basis. The event generation and notification mechanism provided by the base kernel can be used to build system constructs such as hardware exception notifications and user level software signals. In addition, the generality of the event mechanisms can be used to implement a host of user services such as external virtual memory managers, debuggers, monitors and synchronization services. After discussing the system model in detail, we present the design of an event handling system, along with the semantics of the event mechanisms. We discuss the usefulness of the design
using some examples. Then we outline our implementation strategy and finally discuss the related work in this area.
2. The Model The model of the system that we chose is one where objects are passive and persistent. Applications are typically composed of several such objects, possibly spanning the network. Objects may allow concurrent execution by multiple threads. The threads’ active inside an object may all belong to the same application or to different applications. In the latter case, the object is being shared by different applications. Objects in shared memory based systems exchange control via invocations. The calling thread invokes the desired entry point in the called object. Invocations are similar to procedure calls, except that they cross object boundaries (address spaces). In the passive-object paradigm, when an object invokes another, the same logical thread is used to execute the code in the called object. The thread’s attributes (such as parameters of the invocation, state of the I/O connections, etc.) become visible to the called object and are used to exchange information between the caller and the callee. Thread attributes are a key to our design and will be discussed in detail in subsequent sections. In the rest of this paper we refer to this programming environment as the “Distributed-Object/Concurrent-Thread” (DO/CT) programming environment. The DO/CT environment discussed in this paper is based on the Clouds Distributed Operating System [Dasgupta 91, Ananth 91]. While the DO/CT environment has been well covered in literature [Chin 91], the event handling problems and facilities are not well documented. Some known problems are: • Programming environments that allow concurrent execution by several threads in one address space do not clearly specify the semantics of signal handling. For example, the programming environment provided by OSF/1 uses ad hoc solutions to figure out which thread should be notified when a signal is posted to the process [Doeppner 92]. • Consider the signaling of an application composed of multiple processes on a centralized system. Despite the complexity of the application, it can be controlled (stopped, terminated, restarted, etc.) using the information contained in centralized kernel data structures. In the DO/CT environment, the state of an application is distributed across the local kernels running on the individual nodes. • Conventional process or task-based signaling is no longer adequate in DO/CT environments since an object may be potentially shared by threads belonging to unrelated applications. From the above observations, we derive the following set of design goals: • Provide a generalized notification mechanism suitable for use in a DO/CT environment. • Ensure that the mechanism works identically regardless of whether the objects are invoked using RPC or DSM. • Provide for events that are logically related to any particular object as well as logically related to a particular computation or sub-computation.
•
Supplement the features of the notification mechanism so that higher level functionality such as exception handling, distributed lock management, distributed monitoring, debugging, and so on, can be easily implemented.
3. Events in a Distributed System In this section, we characterize synchronous and asynchronous events and their notification and handling in the DO/CT environment. First, the terminology used throughout this document is described in more detail. An event is the occurrence of an observable (in a program sense) activity in the system. Messages, page faults, hardware traps, etc. are events that influence the execution of a user program. For this discussion, we consider events as activities explicitly generated by the user program or implicitly generated by the system. The act of triggering the event is known as raising (or signaling) the event. Predefined events, which are raised by the operating system, are termed system events. For example a division by zero in a user program leads to the raising of a system event by the operating system. Other system events are page faults, alarms, hardware exceptions, etc. Application programs may name events and raise them explicitly. Such events are termed user events. Naming an event involves registering the name with the operating system. For example, names such as COMMIT, ABORT, SYNCHRONIZE, can be registered by an application and raised later to communicate with its group members. Raising an event results in a notice being sent to a set of interested recipients. Delivery of the notice eventually results in the execution of some code by its recipient (or an entity designated by the recipient), usually called the event handler. The act of selecting the set of recipients and posting the event to the recipients is known as event notification. Event notification can be broadly categorized as synchronous or asynchronous, with respect to the raiser of the event. If raising the event causes the signaling thread to block until it is explicitly resumed by a handler, it is termed a synchronous notification. If the thread raises the event but does not block, it is termed an asynchronous notification. Most system events use synchronous notification. For example, a page fault causes the current thread that raised the event (implicitly) to be blocked until it is resumed after the page is available. Not all system events need to be notified synchronously: a timer notification is an event that does not cause the sender of the event (a kernel thread) to be blocked. From the user program, it is possible to raise an event synchronously or asynchronously. Event raising may be synchronous or asynchronous, but event delivery is always asynchronous and event handling is always synchronous. That is, if an event is delivered to an executing thread/process, the process is stopped at the point of delivery (unexpected, hence asynchronous). After the handler finishes executing, the suspended thread is resumed or terminated (hence synchronous).
3.1. Attributes of the DO/CT Environment The design of the event handling facility is influenced by the structural features of the target DO/CT environment. These characteristics are: • Sharability: As mentioned previously, one object may be shared by threads belonging to unrelated applications. Events posted to a thread should not affect the behavior of the unrelated threads inside the object address space. • Persistence: Objects in our model are persistent by nature and may exist passively (i.e. without any thread associated with it). These objects should be able to handle events posted to them, even if there is no thread active inside them. • Dominance: In an application composed of multiple objects, the object in which the current thread is active may not be able to make all decisions concerning events arising due to the thread’s execution [Levin 77]. For example, when a thread T in object O receives a TERMINATE event, the cleanup activity that the event entails may not be definable or serviceable within O. In such cases, other objects higher up in the hierarchy of the invocation chain or even unrelated objects (typically, central servers) might have to be given dominating influence over the current object’s action. • Thread Contexts: As mentioned previously, in a passive object paradigm, object invocation is usually carried out by mapping the same (logical) thread into different objects. This is unlike the RPC style of communication where threads are bound to processes and a remote procedure call is a sugar coating on top of message passing. Consequently, the state of the client is not visible to the server and any state information required by the server must be explicitly passed as parameters to the RPC call. In our view, a procedure call, whether local or remote, must have access to the global state of the computation. As an example, consider a local procedure, within a process, named foo. Assume that the process is connected to an I/O channel (such as an X terminal window). If control is transferred from foo to a procedure bar, any output from bar also goes to the same terminal window, without the programmer explicitly performing any redirections. Thus, the state of the control mechanism (the thread) is visible across all the procedures. Our environment based on passive objects merely extends this view across distributed objects and invocations [Menon, 93] by encoding threads with attributes. Thread attributes contain information such as the connections to the I/O channel that the thread is using, creator of the thread, consistency labels for the thread [Chen 89], etc. Event information is a natural addition to the attributes.
3.2. Design Choices The above characterization of sharability of an object leads to the obvious design choice: most events need to be handled on a per-thread basis. Handling events on a per-thread basis allows the applications to install customized handlers for events raised
while the applications threads’ are active inside a shared object. These handlers remain active for the thread regardless of where the thread is currently executing. We call these classes of handlers as thread-based handlers. The persistent and passive nature of an object implies that events may be posted to an object, even when there is no thread active inside it. Thread-based handlers are inappropriate to handle these cases and what we need are simple, conventional, object-based handlers. Object-based handlers are installed for an object and remain active while the object persists.
4. Handling Events by Threads and Objects In this section, we describe the two classes of handlers in more detail and provide further justification of why the two classes are necessary.
4.1. Thread-based handlers Thread-based handlers are used for customized handling of events arising from a shared object. The object interface lists the events it wishes the application to handle (thus allowing customization by the application) and the invoker of the object attaches handler information to the thread. Once a handler has been attached to handle an event, it remains active as long as the thread is alive. When the event is generated, it is delivered to the handler specified by the thread. The design allows for the handler to be executed within the context of the object in which the event was raised or within the context of the object in which the handler was attached. In the former case, the handler code, located in the thread’s private memory (called per-thread memory, [Dasgupta 90]) is mapped in to the object’s address space and is executed. Allowing the programmer to decide the context of the handler is a powerful technique that can be used to implement a variety of diverse mechanisms such as exception handling and monitoring. Consider exception handling where conventional wisdom dictates that the exception be repaired (if possible) from a safe vantage point outside the context of the signaler ([Levin 77]). In the DO/CT paradigm, when an object invokes another, the invoker supplies a handler for exceptional events that the invoked object cannot handle. The handler performs any corrective action (if possible) and resumes (or terminates) the signaling thread. On the other hand, consider a monitor for a distributed application that samples a thread’s program counter value periodically and sends the information to a monitor system. Such an application requires a timer event to be generated and handled within the context of the current object where the thread is currently active. This is accomplished by attaching a handler for the timer event and executing the handler in the current object’s context. Similar mechanism can be used to implement debuggers that need to have access to the internals of the application that is being debugged. Details of the monitor application are discussed in Section 6. In summary, a handler can be one of the following: • An entry point defined within the scope of the thread at the point the thread enables the handler. A thread can attach handler H
in object O for event E while it is executing in object O. As long as the thread is active, events of type E will be handled by H in the context of O, regardless of when and where the thread is located, when E is delivered. An extension to this scheme is one where the handler is an entry point defined in another object. These kinds of handlers are known as “buddy handlers” [Ousterhout 81]. Buddy handlers are handlers known to the current object in which the handler attachment is done. This is quite useful in implementing monitors, debuggers, etc. where an application can specify a central server as the event handler for events posted to its threads. • A procedure defined in the per-thread area of the thread. The compiled procedure traverses with the thread and will be made visible within the current object in which the thread is executing. Executing within the context of the current object enables the handler to examine and if desired, modify the state of the object/thread. Information necessary to handle the event is encapsulated in a structure called an event block and is passed to the handler. The event block contains generic system information such as state of the registers, etc., for exception handling and space for user defined data structures for user events.
4.2. Chaining of Handlers Each object the thread visits is free to attach its own handler to the thread. When a new handler is attached to a thread that already has a handler attached for the event, the new handler can be attached in a LIFO fashion. This is known as chaining of handlers. A consequence of possible chaining of handlers to a thread is that there may be multiple hierarchies of thread-based handlers eligible to handle the same event. Each object the thread traverses may attach its own handler to potential events arising from the object it is going to invoke next. In this case, the event information is passed along the calling chain. This scheme is similar to the dynamic propagation of exceptions in Ada [Barnes 82]. Chaining of handlers is very useful in distributed lock management. Every time a thread locks data in an object, the unlock routine for that data is chained to the thread’s TERMINATE handler. If the threads receive a TERMINATE signal, all locked data are unlocked, regardless of their location and scope. In general, chaining of handlers is necessary to ensure the proper filtering of events between neighboring objects. For example, if an application uses three objects O1, O2 and O3, events arising in O3 may only be known to O3, and maybe to O2 if return parameters signify that. If the event needs to be propagated back to O1, it must be transformed to a form understandable to O1. Using chaining, O3 can notify the handler attached in O2, which in turn can notify the handler attached in O1.
initialization (or later during an invocation). If any such event is delivered to the object, the handler attached to the object is executed. In this regard object-based event handling can be viewed as another method of invoking operations on objects. However, there are some differences between object invocation and raising an asynchronous event for an object. First, raising the event causes a kernel thread to perform an implicit invocation which is not semantically visible as a normal object invocation. Second, all objects have a set of predefined system events that have defined handlers. Since these are available in all objects, it is possible to raise a system event on any object — this is not possible with object invocations as there is no enforced set of standard routines available in all objects. Third, object-based events can be raised implicitly by the operating system (for example, exceptions) or explicitly by the user program itself. Finally, the mechanism with which the invocation is carried out may have much less overhead than object-invocations. For example, a handler thread can be associated with the object to handle all events on its behalf, thus eliminating thread-creation costs.
5. Programming with Events and Handlers Programming with events in the DO/CT environment requires that it should be possible to direct events at threads as well as passive objects. Programmers can choose the correct semantics by using the various addressing schemes provided by the event handling facility. In addition, synchronous and asynchronous raising of events is necessary to use the event mechanism as an exception handling facility as well as a general purpose communication mechanism. In this section, we discuss the programming interface for the specification of events and handlers for object-based and threadbased event-handling. Next, we discuss the selection of delivery options, followed by options for synchronous and asynchronous delivery.
5.1. Specifying Object-based handlers The operating system specifies the default behavior, when events are delivered to objects. Programmers can explicitly override the default behavior by placing handlers for events, as part of the object specification. The object-based handlers are registered when the object is initialized and is part of the initialization code of the object. The template, as applied to a sample object called my_object is shown below: class my_object { private: handler void my_delete_handler(event_block&) on
{ DELETE }; public:
4.3. Object-based handlers Object-based event delivery is quite orthogonal to thread based handlers because events are delivered to passive entities (objects) instead of active entities (threads). An object can attach event handlers to various system and user defined events upon
// invocable entry points entry void init(); entry void work(int id);
};
The entry points added to the interface specification and their semantic meaning are described below: The keyword handler prefixing the names of the method my_delete_handler specifies an object-based handler that the object has installed to handle the event named DELETE Note that the visibility of this handler is private, implying that this method cannot be invoked directly. When the DELETE event is posted to the object, the handler named in the interface will be automatically invoked. The handler will be passed an event block containing a kernel defined data structure. If the event named is a user event, an optional user defined structure will be appended to the event block.
5.2. Specifying thread-based handlers Handlers for thread-based event handling are attached to threads using a system call attach_handler. The system call parameters specify the type of event and the location of the handler. The following example shows attaching a thread-based handler, to a thread, to handle exceptions. The initialization code in the object shown in the above example will be executed by threads entering the object. This causes the registry of the events and the name of the handlers, with the operating system. void my_object::init() { attach_handler(INTERRUPT, my_interrupt_handler); attach_handler(VM_FAULT, my_server.fault_handler); attach_handler(TIMER, monitor_thread, OWN_CONTEXT); }
In the above example, the thread executing the code attaches the handler my_interrupt_handler, which is a private method in my_object to handle the event INTERRUPT. In the next statement, the thread attaches a handler in object instance named my_server to handle the VM_FAULT event. Finally, the thread attaches a compiled procedure named monitor_thread to capture the system event named TIMER. The procedure monitor_threadwill be executed in the context of the current object where the thread received the TIMER event. Various linguistic mechanisms can be used to enforce restraints on the generality of the mechanisms described here. For example, simple exception handling only requires that the invoker place handlers for handling exceptions that arise when an object invokes another. This can be accomplished by the following means: • Entry point signatures in the object interface specifies exceptional events raised by the entry points. • Calling object attaches handlers to these exceptional events at the point of invocation. • Scope of the handler is restricted to its immediate caller. Detailed description of the language interface is beyond the scope of this paper and is available as a technical report [Menon 93].
5.3. Routing of events to handlers To use the event mechanism, the semantics of raising an event and its subsequent delivery must be clearly defined. User pro-
grams may use the raise system call to raise an event. Once an event is raised by the user (explicitly) or the system (implicitly), it needs to be routed to the appropriate location(s) where the event can be handled. Routing the event needs a destination address: either the name of an object or the name of a thread. The set of valid recipients of events are: • Passive objects: persistent objects without any thread active in them are potential targets for events. Events delivered to these objects are handled by object-based handlers. • Current thread: most system events are posted to the current thread. • Thread groups: threads belonging to an application can form a thread group and event posted to a thread group will be sent to all the members of the group. This is based on the notion of process groups [Cheriton 85]. • Unrelated thread: To support buddy handlers, it should be possible to send an event to any thread. In addition, the sender can send the event synchronously, or asynchronously. Synchronous send will block, until it is explicitly resumed by a handler. Asynchronous send of an event does not block the sender. Synchronous send is achieved by a variant of the raise call, called raise_and_wait. Addressing and blocking options are summarized in the table below:
Call
Recipient of event e
raise(e,tid) raise(e,gtid) raise(e,oid) raise_and_wait(e,tid) raise_and_wait(e,gtid)
Thread tid Threads in group gtid Object oid Thread tid, synchronously Threads of group gtid, synchronously Object oid, synchronously
raise_and_wait(e,oid)
6. Applications of Event Handling In this section, we will illustrate the use of object-based and thread-based handlers for applications running in DO/CT environments. First, we will outline how thread-based handlers can be used to implement simple exception handling. Next, we will show how thread-based handlers can be used to implement the monitoring of a distributed computation for liveliness. An application that requires a combination of object-based and threadbased handlers is outlined in the “distributed ^C problem.” Finally, we will illustrate how applications can control virtual memory operations, by setting thread-based handlers for VM_FAULT events.
6.1. Exception Handling Exceptions are system events that arise due to the execution of code in an object, by a thread. In most cases, exceptions arising while a thread is active inside an object can be handled by a handler in the object itself. An object may wish to take some generic corrective action on an exception before it is propagated to the user (invoker) of the object. To achieve this, an object can define handlers for system events of interest, as part of the object interface. When
an exception is raised for any thread, the object’s handler gets called and if necessary, a further exception may be raised by the object handler, to be handled by the thread handler. The object handler can be run using a surrogate thread (a thread that takes on the attributes of the suspended thread that received the notification) so that the context of the original thread can be examined and modified.
6.2. Distributed Monitoring Monitoring applications for liveliness is a difficult task to achieve in a distributed setting. In this example, we will consider a distributed application that is composed of a collection of objects and a thread of computation that spans machine boundaries. We wish to monitor the application by sending periodic information about the state of the thread (such as the current object the thread is executing in, current program counter value, etc.) to a central server. The central server may use the symbol table information from the compiled objects to display the state of the application. To monitor the thread, two facilities are required: a periodic timer delivered to the thread and a handler to execute when the timer event is received. The former is achieved by using thread attributes: a timer event is added to the thread’s attribute list (from within another object such as a simple monitor that simply attaches a handler for the event TIMER for the thread). When the thread visits another node, the thread attribute list is examined and the event registation information is recreated. This ensures that a timer event will be delivered to the thread regardless of where it is currently executing. The handler for the event is a procedure that gets mapped into the thread’s per-thread memory area. Since this is in the same address space as the object that the thread is executing, the handler simply gets the suspended thread’s state, restarts the thread and sends the information to a central monitor.
6.3. Distributed ^C Problem In this example, we outline how a distributed application can be terminated cleanly (upon a user typing a ^C at the controlling terminal), by using a combination of object-based and threadbased handlers. Though the problem may appear trivial, it isn’t. Remember that the objects being used by our application may be sharing them with other, unrelated, applications. The list of objects include those that the application’s threads are currently active in, as well as objects that lie along the threads calling chain (but may be currently passive). This is necessary so that all of the objects get a chance to perform appropriate cleanup operations (such as closing I/O channels, releasing resources held, etc.). Also, the list of threads to hunt down and terminate (lest they turn into orphans) may include threads spawned during asynchronous invocations. In short, the list of candidates to be notified are: • All threads belonging to the application’s thread-group. • All objects that lie in the path between the “root object” (where the application was started) and the objects where the threads are currently active.
Assuming that the ^C typed by the user generates an event named TERMINATE, the termination of all the threads and notification for all the objects requires the following: • All objects should register an object-based handler for the predefined event ABORT. When triggered, the handler must abort the invocation in progress for the thread named in the event block. This causes the system to send an ABORT event to the object at the other end of the invocation. • The root object must attach a handler for the event TERMINATE and a handler for the defined event QUIT, to the root thread. Any subsequent thread spawned from the root thread inherits the thread attributes (including the event registry and the handler information). When the event TERMINATE is triggered anywhere, the handler attached to the root object gets notified. This handler aborts the top level invocation (causing all objects to be notified) and raises the event QUIT to the thread group. The handler for the event QUIT simply terminates the thread. The “chasing” of distributed threads necessary for termination is done by the operating system implementation responsible for event delivery, as discussed in Section 7. To reduce the complexities of the operations outlined here, the handlers can be made part of the object’s default interface. Details of these operations can be found in [Menon 93].
6.4. User-level Virtual Memory Managers Building user-level virtual memory managers (external pagers) allows applications to bypass the strict consistency imposed by the underlying sequentially consistent distributed shared memory. Implementing external pagers require extensive support from the operating system, in terms of allowing virtual memory operations to be performed on user level segments. Minimally, the ability to handle VM_FAULT operations at user level, operations to install a user supplied page to back a virtual address and specifying a segment as pageable are required operations that should be supported by the operating system. Thread-based handlers, along with the above mentioned facilities provided by the operating system, can be used to implement user level virtual memory managers. The basic strategy is that the applications will tag regions of memory as pageable, request VM_FAULT events and designate a server as the handler for VM_FAULT events (buddy handler). When any thread faults at an address, the thread is suspended and the handler attached to the server is notified. The handler code then supplies a page to satisfy the fault. If another thread faults on the same memory, the server can supply a copy of the page, and later merge the pages. A detailed description of the application is beyond the scope of this paper.
7. OS Support for Event Notification In this section, we will outline the primitives that needs to be supported by the distributed operating system, to implement object-based and thread-based event handling. The important
issues to tackle are the delivery of the events to threads and execution of the thread-based handlers. Object-based event handling requires the operating system to define the default action for predefined system events. Provisions to overload the default action by objects must be provided. In addition, to support posting events to passive objects, a system thread needs to be employed. To reduce thread-creation costs, it is preferable to employ a master handler thread on behalf of a passive object.
7.1. Delivering Events to Threads Thread-based event handling introduces interesting problems regarding the notification of events to threads. When an event is posted to a thread, the system must track down the thread and use the current information in the thread attribute table to run the handler. Locating a thread’s current location is similar to the general problem of finding moving resources in the distributed system [Ahamad 87]. However, finding a thread is harder, as threads move around much faster than other resources (such as objects) in the distributed system. A simple solution to finding threads is to broadcast the event request. When the machine that has the thread active gets the request, it can block the thread, run the handler on its behalf and then resume or terminate the thread. However, this is communication intensive and is wasteful. Another solution is to follow the path of the thread staring from its “root node” (i.e. the node on which the thread was created). We assume that given the unique name of a thread, it is possible to find the root node. Starting with the root node, one can traverse the path of the thread, using information in the system’s thread-control blocks. On a distributed system comprising of n nodes, it is possible to find the thread in n steps. However, this may not find all threads if non-claimable asynchronous invocations are spawned, as the system may not keep track of asynchronous invocations, the results of which are not claimed. At the expense of additional system complexity, a sophisticated thread-management system can be employed to track down the current location of threads. On systems supporting multicast communication, application’s threads can create a multicast group. When a thread leaves the current node and starts executing in another, the thread-management system can join the multicast group. Thus, it should be possible to address each thread by sending a message to its multi-cast group.
7.2. Executing thread-based handlers As explained earlier, when an event is delivered to a thread, there are two types of handlers that may need to be executed. • Handler installed in the context of the object where the attachment of the event handler occurred. Alternatively, the handler could be in the context of another designated object (buddy handler). • Handler installed in the per-thread memory of the current thread, and within the context of the object where the event was raised.
The first option requires an operation similar to remote object invocation except that the thread makes an “unscheduled” invocation to wherever the handler is located. If the handler is located in the current object, the thread does a local procedure call. The per-thread handler requires sophisticated OS and Compiler support. The handler code has to be position independent. Operating system must support the mapping of the handler code into a well known address in the per-thread area of the thread. This well known address must match the compiled virtual address of the handler. Important issues related to fault-tolerance are not addressed in this paper. When a notification is posted to a thread and the thread has been destroyed, the sender of the event (if it is an asynchronous event) needs to be notified. Leaving trails of information regarding the death of threads create garbage collection problems (similar to the creation of “zombie” processes on UNIX).
8. Discussion We have presented a general purpose event handling facility for the DO/CT environment and discussed an implementation strategy for attaining the semantics and the mechanisms discussed. The above facility allows: • The attachment of event handlers to individual threads even if the threads cross machine boundaries. • The attachment of event handlers to passive objects. • Raising of events for a thread. The thread is located in the system and then the thread is made to execute a handler, irrespective of the current location of the thread and the current location of the handler. Chaining of handlers are allowed to provide for the propagation of events through the calling chain. • The implementation strategy is relatively straightforward, given the facilities of thread creation, kernel threads, DSM and RPC invocations and thread location facilities. A prototype implementation is currently in progress.
9. Related Work In this section, we survey the relevant work done on event handling, in distributed and multiprocessor systems. Most modern operating systems provide primitives for exception handling. The UNIX system provides the signal mechanism to support the delivery of system events to user programs and raising of asynchronous events (using the kill system call). UNIX also provides a small number of user definable signals. The entire design of the UNIX signal facility is suitable for single threaded applications only. Distributed programming by using the RPC mechanisms do not handle signals directly. Exception handling in a “shared abstraction” setting was discussed by Roy Levin, in his PhD thesis [Levin 77]. Levin discussed structural conditions and flow conditions, which are vaguely similar to object-based handlers and thread-based handlers in our case. The distinction between these two notions was not clearly identified in his case.
Medusa, a multiprocessor OS developed at CMU for the Cm* [Ousterhout, 1980], supported the notification of exceptions as internal events to the process that caused it and external events to any other process that has an interest in the object in which the event arose. Interest is indicated by either being active in the object or by possessing the capability to it (similar to Levin’s proposal). Candidates for receiving external events included other processes that possesses the capability to the object, as well as a “trusted buddy.” Due to the large number of potential handlers to external events, Medusa’s (as well as Levin’s) exception reporting has the potential to cause a tight coupling within the system. This coupling is undesirable in a distributed system. Also, a lot of extra work needs to be done to maintain a “current interest list,” etc. and the event reporting hierarchy tree could grow out of bounds, if it is not properly controlled. The exception handling facility in Mach [Black 89] and PLATINUM [Fowler] was designed to support exception handling in a message-based system that supports concurrency within an address space. Mach supports the posting of exceptions to tasks, as well as threads. The design is based on the separation on functionality between error handlers and debuggers. Error handlers operate within the context of the task that reported the error event and debuggers operate outside of this context, as a separate task. Mach kernel statically partitions all exceptions to fall within one class or the other whereas PLATINUM extends this by allowing this partitioning to be determined dynamically. One of the key differences between our design and the work done in systems such as Mach and PLATINUM is due to the nature of active object vs. passive object systems. In the former, all threads in an object belong to the object (task) whereas in the latter, the threads may belong to one application or many unrelated applications. Since a logical thread in a passive objectbased system spans many objects, possibly at many nodes, customized event handling is possible by attaching events and handlers to threads. In active object systems, application wide event handling requires a lot of explicit coding by the programmer.
10. Conclusions This paper presents a general purpose event handling facility that can be used for distributed programming. The design is based on the notion that in distributed object-based systems where objects are shared by different applications, event handling needs to be based on positing events to objects and threads. The nature of the passive object-based system presents the view that logical threads span objects across various machines and hence, any customization required can be achieved by attaching attributes to such threads. The event handling facility can be used at a “system call” level without language modifications (though some modifications for elegance are suggested). A complete description of defaults and other possible bells and whistles have been omitted due to space considerations.
11. References [Ahamad 87] Ahamad M., Ammar, M., Bernabeau J., and Khalidi M. Y.: Locating Resources in a Distributed System Using Multicast Communication. Georgia Tech Technical Report GIT-ICS-87/44, Georgia Institute of Technology, 1987. [Almes 85] Almes, G. T., Black, A. P., Lazowska, E. D., and Noe, J., D.: The Eden system: A technical review. In IEEE Trans. on Software Engineering, , Jan 1985. [Ananth 91] Ananthanarayanan, R., Dasgupta, P., Menon, S., Mohindra, A., and Chen R. C.: Distributed Programming With Objects and Threads in the Clouds System. In Journal of Distributed and Multiprocessor Operating Systems,, USENIX, 1991. [Barnes 82] Barnes, J. G. P: Programming in Ada. AddisonWesley Publishing Company, 1982. [Black 89] Black, D. L., Golub, D. B., Hauth, K., Tevanian, A., and Sanzi, R.,: The Mach Exception Handling Facility. In SIGPLAN Notices, Vol. 24, 1989. [Chen 89] Chen, R., Dasgupta, P.: Linking Consistency with Object/Thread Semantics: An approach to Robust Computations: In 9th International Conference on Distributed Computing Systems, June 1989. [Cheriton 85] Cheriton D., and Zwaenepoel, W.,: Distributed Process Groups in the V kernel, In ACM Transactions on Computer Systems, Vol. 3, No. 2, May 1985. [Chin 91] Roger S. Chin and Samuel T. Chanson: Distributed Object-Based Programming Systems. In ACM Computing Surveys, Vol. 23, Mar. 1991. [Dasgupta 90] Dasgupta, P., Chen, R.: Memory Semantics in Large Grained Persistent Objects. In Fourth International Workshop on Persistent Object Systems, Sep. 1990. [Dasgupta 91] Dasgupta, P., LeBlanc Jr., R. J., Ahamad, M. and Ramachandran, U.: The Clouds Distributed Operating System. In IEEE Computer, Nov. 1991. [Doeppner 92] Thomas W. Doeppner, Jr. OSF/1 Internals. Seminar Notes, USENIX Summer Conference, June 1992. [Fowler 92] Robert Fowler and Leonidas Kontothanassis: Supporting User-Level Exception Handling on a Multiprocessor MicroKernel: Experiences with PLATINUM. In proceedings of SEDMS-III, Newport Beach, CA, Mar. 1992. [Levin 77] Roy Levin, Program Structures for Exceptional Condition Handling, PhD Thesis, Carnegie-Mellon University, 1977. [Liskov 87] Barbara Liskov, Dorothy Curtis, Paul Johnson and Robert Scheifler: Implementation of Argus, In Proceedings of the 11th ACM Symposium on Operating System Principles, 1987. [Menon 93] Menon, S: Event Handling in Passive Object-Based Distributed Systems. Technical Report, GIT-CC-93/09. [Ousterhout 81] J. K. Ousterhout, Medusa: a distributed operating system. UMI Research Press, 1981. [Spaf 86] Eugene Spafford: Kernel Structures for a Distributed Operating System. PhD Thesis, School of Information and Computer Science, Georgia Institute of Technology, 1986.