Customizable Object-Oriented Operating Systems Roy H. Campbell John Coomes Amitabh Dave Yongcheng Li Willy S. Liao Swee Lim Tin Qian David K. Raila Ellard Roush Aamod Sane Mohlale Se ka Ashish Singhai See-Mong Tan Department of Computer Science University of Illinois at Urbana-Champaign Digital Computer Laboratory 1304 W. Spring eld Urbana, IL 61801 froy,jcoomes,dave,ycli,liao,sblim,tinq,raila,roush,sane,se ka,singhai,stan
[email protected]
December 6, 1995
1 Introduction Operating systems serve the needs of application programs by managing system resources and providing mechanisms by which programs may use them. In a general purpose operating system, the policies that govern the use of these resources must be exible enough to accommodate a broad range of usage patterns by various applications. The operating system designer traditionally implements policies that most bene t a \characteristic" workload. A classic example is the choice of le system caching policies. The policy of discarding the oldest pages from the cache (the Least Recently Used or LRU policy) is appropriate for most applications such as text editors. However, studies have shown that database applications consistently exhibit decidedly non-LRU behavior. By attempting to cater to some average of user behavior, the operating system performs poorly for some others. Users who run one or a few programs rather than a large, heterogenous mix of programs matching the design pro le are particularly susceptible to this. A general purpose operating system simply cannot oer the best possible performance for every individual application. An obvious solution to this problem is customizable operating systems. In such a system, the user is allowed to customize the behavior of operating system services in an application-speci c manner. In the le caching example, knowledge about temporal and spatial patterns of le access in a database program can be applied to produce a customized OS that caches les appropriately for databases. Many modern operating systems allow customization of some aspects of system behavior. Several operating system kernels have system primitives that explicitly allow user-level programs to hook into the ow of control of system activities. An example is user-level pagers in Mach[1] and Spring[2]. Mach's virtual memory architecture allows the association of memory regions with user-level programs that are invoked through interprocess communications whenever events of interest, such as page-ins and page-outs, occur in that region. Ease of design and eciency of implementation determine how successfully an operating system can be customized. This article oers our solution to the problem of building customizable operating systems and describes the results achieved in implementing a customizable operating system. We have built an object-oriented, customizable operating system, Choices [3, 4]. We advocate object-oriented programming to structure customizable operating systems, and Choices is designed as an interacting collection of object frameworks. Descriptions of Choices , of framework methodology and its application to OS subsystems is given in the next section. Structuring the system as a set of frameworks facilitates the design, maintenance, and extension 1
of the system, and it also improves performance by eliminating intermodule entanglements. For example, Choices achieves a process migration latency of 13.9 ms with SPARCStation 2s on Ethernet, due in part to framework design that identi ed and eliminated unnecessary operations from the critical path. By contrast, Sprite [5] has a migration latency of 330 ms on SparcStation 1s on Ethernet. Using these object-oriented techniques we have implemented three dierent customizable subsystems in Choices : le systems, message passing and distributed shared memory. We describe each subsystem and its performance in detail later, but we summarize brie y some performance highlights here. Our experiences with these subsystems show that applications in Choices can indeed improve their performance by customizing Choices to their speci c needs. In the Intel Hypercube (a message-passing multiprocessor machine), customizing the message passing framework for both the hardware and the application's communication patterns reduces execution time by 21% in an eight-node machine over that with PVM [6]. Customized policies in the Choices adaptive le-caching service resulted in a ve-fold speedup for a large-scale parallel computation over Sun NFS. A customized implementation that relaxes distributed shared memory consistency in Choices shows noticeable improvement in performance for sample applications over both local area ethernet and a wide area ATM testbed. We then describe a tool we developed, OS View, that visualizes Choices dynamically with special consideration for Choices 's object-oriented structure. OS View lets a system designer or a user view the system at a high level of abstraction by manipulating and displaying items in terms of objects or groups of related objects. This tool gives a user a dynamic understanding of the system in terms of its objects and frameworks. Such knowledge is tremendously helpful and often essential for the user who must understand the system's internal interactions in order to customize the system dynamically as it runs. Finally, we conclude our article with discussion of directions for our future work.
2 Frameworks and Subsystems in Choices Customization in Choices is achieved using frameworks and subsystems implemented in C++ [7]. A framework implements the design for software components that operate together to perform an action or set of actions. Frameworks in Choices encapsulate abstract design solutions for speci c features of the system. Frameworks are reusable, customizable, and provide a speci c interface and a reusable implementation. Frameworks are used extensively in Choices to support a variety of features including: the run-time class and object tracking and debugging framework that supports access to C++ type information, the process framework that de nes the process model and protocol, and the hardware support framework that encapsulates and provides a generic protocol to the machine architecture. Every logical or physical system entity, such as CPUs, disks, memory, schedulers, address spaces, locks, and so forth is represented as a C++ object belonging to a C++ class in a framework within Choices . There are frameworks for the process subsystem, the virtual memory subsystem, the message passing system [9] and so forth. Frameworks specify the interactions that are permitted between components and their relationships to each other [8]. The interfaces that the frameworks in Choices export to clients de ne a contract between the framework and the programmer. The framework agrees to perform its function on the objects provided by the client and the client agrees to customize the objects in particular ways speci ed by the framework. Customization is supported by allowing subclassing of key framework component classes. Each framework in Choices consists of a number of abstract classes that de ne the interface to the framework, the operations on framework objects, and the potential customizations that the framework supports. Internal framework data structures and algorithms are written in terms of the abstract classes, allowing the programmer to customize the framework by customizing key framework classes within the constraints enforced by the framework. Compile time checking enforces type safety and guides the programmer in implementing customizations by requiring particular behaviors and by enforcing the use of the interface provided by the framework.
2.1 Framework Example: Process Subsystem
A typical framework in Choices is the process framework, which provides a hardware-independent process subsystem for the kernel. Key classes in this framework are the Process, ProcessContainer and ProcessManager classes. The Process abstract class encapsulates the state of a lightweight thread of execution, i.e. its stack and control state. This class de nes methods for manipulating process state. The ProcessManager class 2
Process
ApplicationProcess SystemProcess
ProcessContainer
BSDNetScheduler FIFOScheduler PriorityProcessContainer SemaphoreRegulatedFIFOScheduler
Figure 1: Class hierarchies for the Process and ProcessContainer classes represents a centralized process scheduling facility that manages a global ready queue and the timeslice timers. The ProcessManager de nes methods for creating, suspending and killing processes. These methods are called by clients in the kernel, who are entities outside the process framework and therefore may not manipulate Process objects directly. Most ProcessManager methods invoke Process methods to do their work. The ProcessContainer abstract class encapsulates a scheduling queue used by the ProcessManager to hold Process objects. The ProcessContainer class has methods for adding and removing Processes from the queue according to some unspeci ed policy. Figure 1 shows the class hierarchies for the Process and the ProcessContainer classes. Customization of the process framework proceeds by subclassing framework classes and overriding certain methods to provide a custom implementation of a framework object. For example, subclasses of the ProcessContainer class must de ne methods for adding and removing Processes from the queue represented by the subclass; these subclasses thus de ne the priority policy of the queue. A FIFOScheduler subclass representing a rst-in, rst-out queue can be implemented by having the subclass maintain a linked list internally with additions placed at the front and removals taken from the tail. The Process class can also be customized as needed by subclassing, and in Choices the SystemProcess subclass encapsulates a process that only runs in the kernel. This subclass de nes or overrides methods as needed to optimize details such as register handling for a process that does not ever leave the kernel protection domain. Likewise there is an ApplicationProcess subclass that implements independent kernel and application stacks for processes used to run applications, which in general need to call the kernel. Since subclasses must obey the framework speci cations for behavior by providing the same interface to other framework objects, they can be inserted easily into an existing design [8]. Customization by subclassing is a common theme in this paper and in good object-oriented framework design. Oject-oriented programming facilitates incremental specialization and re nement of object behavior via subclassing, and these properties carry over into frameworks. Customizing an object-oriented framework by extending the class hierarchy also contributes greatly to ease of maintenance and understanding. Most importantly, these bene ts are achieved in an operating system without a loss of performance. Our experience with process migration shows that the organization required by object-oriented frameworks actually has substantial bene ts.
2.2 Framework Performance Bene ts: Process Migration
Choices supports very low-latency process migration. On a SPARCStation 2 it takes 13.9 ms to migrate an application process over a local area ethernet [10]. The Sprite operating system achieves the next fastest result found in the literature on comparable hardware, at a latency of 330 ms on a SparcStation 1 over Ethernet [5]. Choices process migration uses a new algorithm (called the Freeze Free Algorithm [10]) that sends only the absolute minimum state necessary to begin process execution on the destination host. The organization of kernel state required in framework design in Choices dramatically speeds up the excision and insertion of process state from the kernel. Results show that a system such as Accent [11] spends on average 17.5% of overall migration latency simply excising process state from dierent parts of the system [12]. In contrast Choices spends only 3.6% of overall migration latency for the combined time to excise at the old
3
host and to insert at the new host. Since Accent's algorithm, when normalized to the hardware used in our experiments, is still slower than the Choices algorithm, the dierence in relative excision times is even larger than these gures indicate. State separation is maintained in the Choices kernel by its object-oriented frameworks. The process management framework does not carry any le or communication link state internally. Any such state required for a process is kept on its behalf by the appropriate subsystem framework. Frameworks export only abstract interfaces. As we use C++, this means that subsystems keep references only to objects of abstract classes. Since state for the lesystem and the communication links is not entangled in the process system, it does not need to be sent to recreate a process on the remote side. Consequently this state is not part of the absolute minimum state necessary to resume execution and can be sent later after the process has resumed execution, outside of the critical path.
3 Message Passing System One of the goals of Choices was to provide an environment for parallel computing on hardware platforms ranging from shared memory multiprocessors to networks of workstations. The message passing system provides a uniform model for parallel programming on all these platforms [9, 4, 13, 14]. The hardware features that the message passing system can exploit to provide high performance vary widely between shared memory multiprocessors and networks of workstations and seem to require separate implementations of the system. Separate implementations lead to problems maintaining dierent versions of the system, duplicated functionality and possible incompatibe interfaces. Frameworks allow the design of a single system capable of running eciently on multiple platforms and porting the system to new platforms while allowing the reuse of signi cant amounts of common functionality. Another important issue addressed in the design of the system is that application communication patterns vary from application to application. Generic message passing systems, which don't take these patterns into account suer performance penalties. The message passing system should therefor permit customization of the features provided by the system to optimally implement the application's communication patterns. The system should also permit extensions to the features provided by the system to t an application requirements. Our experience with the prototype of the message passing system implemented for Choices can be summarized as follows: Frameworks are an eective technique for designing and implementing a portable message passing system. Subclassing can be used to provide static customizability and extensibility but is not convenient for dynamic extensions to the system. Providing portability and extensibility does not aect performance signi cantly. Application speci c customizations of a message passing system improve application performance. We will discuss these experiences in detail with examples in the rest of this section
3.1 Customizable Framework for Message Passing
Figure 2 shows the general structure of the message passing system framework. Each component in the gure is implemented as a subframework. Each framework is implemented as a collection of cooperating classes. The Application Programmers Interface (API) de nes the interface to the message passing system. The Message Handling subsystem includes class hierarchies rooted at MessageContainer and ContainerRepresentative abstract classes. A MessageContainer is a communication endpoint similar to a port. A process that owns a MessageContainer can receive messages using it. A ContainerRepresentative is a proxy for a MessageContainer and is used to send messages to the MessageContainer. The Naming framework implements a name service which allows MessageContainers to be bound to a name and looked up using that name. A ContainerRepresentative is returned on a successful lookup. Classes in the Transport framework implement the mechanism that is used to transfer a message from a sender to a receiver. Dierent subclasses implement dierent copying techniques to transfer data between processes on a shared 4
Network Message I/O API Naming Handling Transport Reliability Data Transfer Framework
Synchronization
Figure 2: Message Passing System memory machine or fragmentation and reassembly techniques for transfer of data over a network. The ReliabilitY subframework is used to implement dierent reliability semantics including at least once and exactly once for an implementation of the message passing system on a network of machines. The classes of the Synchronization framework implement dierent locking strategies for critical sections in the message passing system including test-and-set, test-test-and-set, array, and queue locks. The appropriate subclass can be chosen based on critical section size, expected contention, target architecture and memory constraints. Buering strategies including single copy, double copy and copy by reference are implemented by the classes of the Data Transfer framework. Experimental results have shown that the copying strategy is an application speci c parameter that can have a signi cant impact on application performance [9]. The Network I/O framework provides an interface to the network the message passing system is being run on. The message passing framework allows customizability at a ne grain as well as a coarse grain. For example, an implementation of the system for a shared memory machine is obtained by eliminating the reliability subframework. Implementations for dierent kinds of networks are obtained by customizing the network I/O subframework. Obtaining dierent reliability semantics requires subclassing a few classes of the reliability subframework.
3.2 Performance
The results reported in this section support the contention that a customized message passing system improves application performance. Figure 3 gives analytical results which document the number of network messages, network interrupts and send buers required to reliably implement the communication required by the Simplex application on an unreliable network with multicast capability when using PVM, Amoeba, and Choices [9]. Both PVM and Amoeba are examples of systems not designed for customizability. PVM cannot be customized to use hardware multicast and therefor uses simple sends to implement multicast increasing the message and interrupt count. PVM also acknowledges each application message individually. Amoeba, on the other hand uses hardware muticast, but it's multicast protocol uses a sequencer and is not con gurable. This leads to a higher message count than necessary. Because a sequencer is used, the number of buers used is unbounded. All simple sends are implemented as RPCs which again lead to higer network message counts. The Choices system can be easily con gured to use hardware multicast and customized to take application message patterns into account in reliability subsystem optimizations. This reduces the number of network messages and interrupts. Figure 4 show the impact these optimizations have on the actual execution time of Simplex on Choices over that on PVM. Other applications including Cholesky and FFT and distributed system services like gang scheduling show similar performance improvements.
4 Adaptive Distributed File Systems A distributed le system allows le access and sharing over a network of hosts. File servers manage the physical repositories containing les on disk or other secondary media, while clients make requests to the appropriate servers for les in response to application demands. File caching keeps recently accessed le 5
System PVM Amoeba Choices
Network Messages 6N ?4 3N +4 N +2
Interrupts 12 N ? 8 8N +4 4N
Send Buers 1 unbounded 2
Figure 3: Simplex Overheads Percentage Improvement For Simplex Number of Nodes 2 4 8 10 14 21 Figure 4: Simplex Performance data in higher speed memory in caches. If the data is accessed again, the latency for retrieval is reduced since it can be fetched from the cache. Caching signi cantly improves performance in distributed le systems. The Choices operating system includes adaptive caching in its distributed le system. The design goals for a exible le caching service include: Flexibility : support for a variety of dierent caching strategies. Extensibility : permit the addition of new strategies easily. Customizability : allow the system to select the appropriate strategy for the le based on the computing environment. Performance : perform better than conventional le systems. The Choices distributed adaptive le system consistently outperforms non-adaptive caching schemes for a variety of access patterns, applications and le types. In particular, it reduces cache misses by 20.6%, network load by 24.2%, and runtime by 36.6% for the sample workloads we measured. The le caching service in Choices is built as an object-oriented framework. The framework de nes the relationships between its component objects. The system is customized by choosing components in the framework and plugging them together. It permits performance tuning based on applications accessing les, le access patterns, the host con guration, and the network con guration. For example, caching strategies vary according to the amount of host memory available. A diskless workstation needs to cache les in main memory, while a workstation with local storage can cache using its local disk. Whole le caching at the client side is possible in the latter case, and inappropriate in the former. Network environments with relatively low bandwidth and high transfer latencies call for more aggressive caching at the client end, suitable prefetching strategies or larger transfer sizes. The design of the Choices distributed le system permits exibility in the choice of caching strategy and is customizable according to the host's computing environment. Each le may have its own caching strategy, as well as its own secondary cache. Caches may exist on both the client or the server side, and caching strategies on the client and server sides may vary even for the same le. At issue is how the system selects a caching strategy for a given le. Application behavior is, in general, unknown to the system. Modifying applications to provide hints is tedious and error prone. Application developers may not be sophisticated enough. In addition, the computing environment will vary from host to host. Therefore, the Choices adaptive le caching service automatically selects the appropriate caching strategy from multiple available strategies. Application writers may optionally provide hints to the caching service. Our solution is to de ne le access attributes for a given le (eg. random or sequential). The system observes le access behavior and records these as le attributes. In addition, the system's selection heuristics are evolved with the computing environment. The observed behavior is used to select the appropriate caching strategy. This approach works because prior studies show strong inertia in le access patterns: les are likely to be accessed in the same way | previous behavior predicts future behavior. 6
The le attributes we use are: temporary versus permanent, single-access versus multi-access, large versus small, random access versus sequential, and whether a local disk cache is bene cial. These are stored as a vector of 5 bits. In a UNIX le system, this can t into the unused bits in a UNIX inode's meta-data, thus incurring no extra space overhead. The le attributes are used to select the caching parameters as described in table 1.
Read ahead Prefetch Write behind Free behind Zombie lifetime
The amount of data to read ahead assuming sequential access The whole le is fetched into the cache before reading. Coalesce writes to single large write. Spread writes out over time. Using knowlegde of sequential access, cache units are freed after being accessed. File objects which are deleted become zombies. Zombies keep cached data alive. If a le is opened again after being closed, the data does not have to be read into the cache again. Local disk caching Local disk caching is bene cial for multi-access and frequently read les, and not for write-only or read-once les. Table 1: File caching parameters in Choices . Figure 5 depicts the architecture of the le caching service in Choices . The Host Memory Manager free pages
Host Memory Manager
pages in cache
FILE CACHE
FILE
Cache Memory Manager
Local or remote file Migration Policy
OPTIONAL
Tertiary Storage
Secondary Cache
Figure 5: Architecture of le caching service in Choices . manages the allocation of main memory on the host. Each le, whether local or remote, cached on the host 7
is allocated a File Cache . The semantics of the behavior of the cache are de ned by modules that determine how data is cached in main memory, the caching replacement policy, and how cached data is moved in and out of main memory. The Cache Memory Manager implements \general housekeeping" for the cache, including keeping track of where the data is cached, and negotiating with the Host Memory Manager for more memory. The Migration Policy object is policy rich. It de nes data transfer and cache replacement policies. The architecture separates the policy free modules from the policy rich modules, enabling the migration policy to change without losing track of where the data is cached in main memory. We measured the performance of our le system framework based on two trace driven workloads. The rst was building the Choices operating system itself. This involved many programs and many small les. The second was SMS, a scienti c computation for seismic migration, with a few large les and large transfers. We compared the adaptive caching strategy against several increasingly sophisticated strategies: LRUCLK: a standard LRU caching strategy, ZOMBIE: LRUCLK plus zombie reclamation, SMART: ZOMBIE plus read-ahead, write-behind, free-behind, LCACHE: SMART plus local disk caching, and ADAPTIVE: previously described adaptive caching strategy. We also compare each of these against standard SunOS NFS. Figures 2 and gure 3 summarize the perStrategy Runtime (secs) Cache miss indicator Network load (4 kb pages) (Mb) LRUCLK 654 7281 33.60 ZOMBIE 615 6787 31.93 SMART 614 6642 30.95 LCACHE 16 Mb 631 8582 36.81 ADAPTIVE 16 Mb 606 6809 27.92 LCACHE 32 Mb 614 7329 17.71 ADAPTIVE 32 Mb 598 6486 15.84 LCACHE 64 Mb 600 7512 14.85 ADAPTIVE 64 Mb 596 6471 14.85 SunOS NFS 803 | | Table 2: Adaptive caching service: performance comparison for building Choices . Strategy
Run time Cache miss indicator Network load (secs) (4 kb pages) (Mb) LRUCLK 719 47624 453 ZOMBIE 716 47555 453 SMART 531 47570 453 LCACHE 64 Mb 487 47536 260 ADAPTIVE 64 Mb 442 47595 256 LCACHE 128 Mb 438 47562 155 ADAPTIVE 128 Mb 417 47550 155 SunOS NFS 2007 | | Table 3: Adaptive caching service: performance comparison for SMS. 8
formance for the two workloads. The metric we use is the program run time, cache miss indicator , ie. the amount of data read into the cache, and the network load , ie. the amount of data transferred over the network. The environment was an 10 Mbps ethernet connecting two Sun Sparcstation 2's with 32 MB of memory and a 1 GB Micropolis disk. The results show that ADAPTIVE has fewer cache misses, and better main memory management that osets the increased demand on main memory due to local disk caching. There is better local disk utilization and it consistently performs better than non-adaptive strategies.
5 Distributed Shared Memory Distributed Shared Memory emulates shared memory over networked computers. In the emulation, local memories of networked computers are treated as caches of a virtual shared memory; in reality, data exists only in the caches. To avoid latency of data access, it is essential that these caches maintain local copies, as in the case of shared memory multiprocessors. However, when a processor modi es cached replicated data, somehow all of the copies must be updated before other processors access their local data. Similarly, if two processors modify their caches simultaneously, the implementation must choose between the values written. Various solutions to these two problems dierentiate distributed shared memory systems: Kai Li's pioneering system used a single-writer per page policy [15], Munin introduced weak consistency models [16, 17], and recent research considers other models and implementation techniques [18, 19, 20, 21, 22]. In our work, we have investigated three aspects: architectures for building protocols [23], correctness of protocols [24, 25], and distributed memory models for high-latency networks [26]. Here we show how the improved architectures promote reuse of protocol code, and illustrate how the distributed memory model permits ecient application execution over high-latency networks.
5.1 Extensible Virtual Memory System Architecture
A virtual memory system manages the data necessary to maintain the correspondence between virtual and physical pages, and also interacts with the le system to manage paging. To implement distributed shared memory, the virtual memory system must also interact with the networking system. Thus, it becomes convenient to treat the virtual memory system as an event driven system with events like pageFault , pageOut , and remoteRequest , and implement it using state machines. State machines for ordinary virtual memory, copy-on-write, and distributed shared memory consistency protocols share signi cant amounts of behavior. Further, a state machine for distributed shared memory should be usable wherever an ordinary memory machine may be used | that is, a distributed shared memory machine should be a subtype of an ordinary machine. In general, we would like to use all standard objectoriented constructions such as subclassing, composition or delegation with state machines, and would like to reason about substitutability. Object-oriented state machines [23] were developed to achieve systematic code reuse for protocols. In Choices , such machines structure the interaction of the virtual memory system with the rest of the system. This is especially convenient in deriving custom distributed memory consistency protocols; one user [27] of our system added checkpointing to our distributed shared memory implementation in a few weeks without ever consulting us. However, the state machines are only a part of the overall architecture. We are isolating and identifying other aspects of our system to generalize it beyond virtual memory management using design patterns [28, 29].
5.2 Customizing Consistency Maintenance
Ideally, a distributed shared memory implementation would be indistinguishable from a true multiprocessor while allowing high-performance computation. However, maintaining the consistency of data copies requires substantial communication that hinders performance. Weakly consistent memories relax consistency requirements by utilizing synchronization information that is implicit in a program. For example, applications contain sychronization constructs such as locks, barriers and task queues that de ne high-level consistency requirements needed by the application. Thus, if a process p modi es some data protected by a lock, p knows 9
S p e e d u p
4 3 2 + 1 3 2 1
+ 2 3
+ 2 3 2
3
Processes
MatMult,400
+ 2 3
4
400 int
4 3 2 + 1 3 2 1
2 + 3
+ 2 3
2 + 3 2
3
4
Processes
SOR,1024
3072, 50 Iter.
4 3 2 + 3 2 1 1
+ 2 3 2
+ 2 3 3
Processes
+ 2 3
Xunet Xunet (coord) Ethernet Ethernet (coord)
3 + 2
4
QSORT,512K int.
Figure 6: Peformance with and without Coordinators that a dierent process q will not access that data without rst acquiring the lock. Therefore, the eect of write operations by p need not be visible until it releases the lock or even until q subsequently acquires the lock. Further, only the last modi cation need be communicated. Thus, consistency related communication is reduced, and application performance improves. However, now the lock acquisition time becomes a communication bottleneck, especially if the distributed shared memory operates over a wide area network with latency measured in milliseconds. For instance, if p and q are on dierent machines and q wants to acquire the lock from p, q must ask p for the lock and perhaps repeat its request, and when p grants the lock, it must wait until the acknowledgment from q to be sure that p now has the lock. Instead, if p knows that q will ask for the lock and access the data, p may proactively forward both as soon as it is done, reducing communication. Further, if p knows that it will later interact with q, then even the explicit acknowledgement may be eliminated. In short, just as it is argued that the standard shared memory is overly consistent, we argue that traditional synchronization constructs are overly conservative. A principled approach to analyzing and exploiting patterns in synchronization is suggested based on the logic of knowledge [24, 25], which enables us to formalize and reason about statements involving what one process knows about another. We use a notion of coordinators [26] to explore the ne details of traditional synchronization constructs. A prototype implementation has shown the eectiveness of this approach. In this implementation, applications are written using a library of synchronization constructs that implicitly inform the implementation about synchronization patterns. In addition, we incorporate user level hints, as well as prediction based on access histories. Together, all this information minimizes synchronization related communication, piggybacking it on top of data transfers. Figure 6 shows some preliminary performance results for some applications. The performance data presents comparisons over a local area ethernet and XUNET, an experimental widearea ATM network with a average bandwidth of 123 Mb/s for 64KB buers and latency of 36ms. In all three experiments, we compare performance with applications with and without adaptive coordinators (this corresponds to lazy release consistency [17]). Noticeable improvement occurs for SOR (Successive Over Relaxation) only with XUNET since barrier synchronization is not very expensive with local area networks. For QuickSORT which is bounded by task queue contention, we get improvement for both local and wide area networks, and the higher bandwidth and packet size appear to make up for latency.
6 An open visual model Whereas the designs and implementations of modern object-oriented operating systems are driven by new demands for user-speci c tailorability and extensibility, these systems remain black boxes whose internals cannot easily be examined or modi ed by the user or application. Analyzing and customizing the run-time dynamics of systems built as black boxes is dicult and time-consuming. Adapting system components for non-traditional applications, including but not limited to multimedia, remains a daunting task for application programmers[30]. To address the issue of navigating the diverse environment provided by the Choices systems, we developed an open visualization and manipulation model that complements the direction taken by the object 10
technology[31]. The model enhances system comprehensibility and usability by systematically representing arbitrary system components as objects within a common visual notation for browsing and manipulation. Thus, system administrators, operating system developers, application programmers, and users can \seeinto" the black box to watch, inspect, con gure, and manipulate the system within the constraints imposed by reliability and security. Our experience has been that graphical interface tools that methodologically model the inner workings of an object-oriented operating system visually can reduce and guide the eorts to comprehend, manage, or tailor the systems. Indeed, visualization has been established as eective in a variety of software understanding-intensive tasks, including dynamic analysis of sophisticated concurrent programs[32]. Our approach exploits the advantages of interactive software visualization to simplify both the scrutiny and steering of the run-time properties of an object-oriented operating system. We implemented an interactive interface tool, OS View [31], to browse and con gure Choices using our approach. The tool visualizes and manipulates behavioral and structural dynamics, aiding understanding and programming. The viewer may navigate and explore the Choices environment using OS View :
All operating system objects are visualized and, subject to protection and security considerations,
can be directly manipulated. Relational queries allow selective scrutiny of special groups of objects respecting user-selectable properties and attributes. Choices permits new services to be automatically loaded in the system as needed by application and system programs without rebooting the kernel[3]. OS View allows scanning or browsing of all the currently loaded services and their corresponding classes and class hierarchies. The visually displayed services and class hierarchy trees re ect the running operating system not the compiled system { they depict dynamic system state and evolution. Both system- and user-level services can be dynamically recon gured, customized, and evaluated through the graphical tool. Classes of interest can be selected from the browser to create concurrently animated views portraying the interaction patterns of the class instances within and across subsystems. Such views reveal deep and ne-grain system implementation knowledge at a high level of abstraction.
Figure 7 shows the main control panel for OS View. The key operating system objects that provide a gateway for accessing information about all other system objects are listed vertically as raised buttons under the label Select Main Object. The bottom of the control panel displays sunken buttons that provide access to some real-time monitoring capabilities of the tool. The viewer of Choices explores the system by pointing and clicking on the operating system objects represented as graphics images. Subject to reliability and security constrainsts, the viewer may interactively load or activate alternative services, for example, to experiment with competing resource management policies and mechanisms. Figure 8 displays the performance statistics for two memory management policies that are being interactively evaluated. This experiment demonstrates that OS View also acts as a performance guidance tool, allowing users to dynamically construct and analyze specialized system services with speci c features.
7 Conclusion The theme of our research is object-orientation and its use in building a customizable operating system. Choices demonstrates that operating systems are realizable as collections of customizable, object-oriented frameworks. The system includes a rich set of frameworks devoted to subsystems such as process management, message passing, le systems, and memory management. The frameworks de ne the how components in a system t together and interact. The abstract framework is instantiated by concrete classes that implement the functions of the abstract classes. By choosing suitable components and plugging them together, custom support within the operating system is possible. The object-oriented frameworks within Choices signi cantly facilitate the system's exibility, extensibility and manageability. Custom implementations are introduced with the use of inheritance in order to specialize the components within a framework. We 11
Figure 7: OS View's main control panel.
Figure 8: On-the- y comparative studies of memory management policies..
12
took care to design the abstract classes in our frameworks to support the widest variation of implementations possible. At the same time, frameworks guide the process of specialization. Customizations are derived by subclassing from existing classes, and re ned by further subclassing. Frameworks thus support the incremental specialization of components. The three experiments in customizability we presented show considerable improvement in their respective problem domains. Message passing eciency was improved as acknowledgements were eliminated and buer requirements bounded. Adaptive le caching improved the run time, decreased cache miss and reduced network load in the Choices distributed le system. Customized protocols for maintaining distributed virtual memory consistency resulted in improved performance for parallel applications in both local area ethernet and our wide area XUNET testbed. Both coarse grained and ne grained customization is supported in these frameworks. Coarse grained customization is obtained by replacing framework components; ne grained customization is obtained by subclassing from an existing implementation. Adapting black box systems in order to t custom implementations is a daunting task without help from the system. OS View attempts to systematically present system components and its interactions visually, aiding in understanding how the components interoperate within the operating system. The tool \opens up" the system and allows the user, subject to security constraints, to experiment with alternative resource management policies and visualize its impact on the system and on application performance. We are extending our ideas in Choices [33, 34, 35], the successor to Choices . Choices is a redesign of Choices as a micro-kernel operating system, and is also constructed on the concept of customizable objectoriented frameworks. It extends Choices by supporting embeddable scripts for customization. The system can be dynamically customized via the addition and modi cation of scripts at run time. Customization in existing operating systems, including Choices , are restricted to compile time modi cations. For example, in systems like Mach[1], V++[36], SPIN[37] and Apertos[38], application-speci c meta-level policies are implemented by user-provided compiled code. The Synthesis[39] kernel and its ospring, Synthetix[40], adopt a less conventional customization strategy that incorporates run-time code generation. However, the introspection and feedback scheme that triggers the generation of new code is still xed and pre-determined at compile time. In short, none of the current operating systems provides a fully open architecture for dynamic customization, where new application-speci c policies can be generated on-the- y, activated, deactivated, or completely removed without excessive recompilation and restructuring. Choices allows user programs to embed scripts into the kernel [41]. Scripting objects in Choices are user installable alternatives to kernel objects representing resource management policies. These objects execute user de ned policies written in a scripting language. Scripts are attractive as kernel extension languages. As they are interpreted, the safety aspects of embedding user code into the kernel is entirely constrained by the interpreter. The recent introduction of the Java[42] interpreted programming language oers the potential for ecient script execution at speeds comparable to compiled code[42]. In the current release of Java (beta release 1.0b1), the language compiles to a machine-independent byte-code. Future releases will incorporate compilation from byte-code to native machine code as programs are loaded into the interpreter (\just-in-time compilation"), thus trading greater cost at start up to more ecient later execution. The Java language seems well suited for the task of dynamic customization and application-speci c customization in an operating system. Designed as a downloadable Internet scripting language, Java scripts cannot manufacture pointers to arbitrary locations in memory. As an object-oriented language, it complements the object-oriented nature of the base Choices system. Just as objects in the base system's framework are customizable through inheritance and dynamic binding in C++, scripts in an object-oriented language can be incrementally specialized through the same language level mechanisms. This means that it is possible to extend dynamically already running scripts that implement objects in a class hierarchy. Extensions implemented in an object-oriented language bring the advantages of encapsulation, incremental specialization and code reuse to the extension modules themselves. Scripts bring other avors of exibility to operating systems. We are implementing active capabilities, where a capability is composed, in part, of an encrypted script. The script can encode non-conventional access rights which are determined by executing the script. Another interesting possibility is the integration of user and system objects through frameworks. The system can expose certain frameworks into which the user can transparently integrate his own objects. The user can also make user subclasses of system-provided classes. A seamless integration of all kernel-resident objects, regardless of whether they originate from the 13
applications at run-time or from the system designer at compile-time, would be very potent. The user could create new frameworks on the y to solve problems as they arise or modify existing ones to the extent that system security policies permit.
References [1] Richard Rashid. Threads of a New System. UNIX Review, 1986. [2] J. Mitchell et al. An Overview of the Spring System. In Proceedings of Compcon 'Spring 1994, February 1994. [3] Roy Campbell, Nayeem Islam, Peter Madany, and David Raila. Designing and Implementing Choices:an Object-Oriented System in C++. Communications of the ACM, September 1993. [4] Roy H. Campbell and Nayeem Islam. \ Choices: A Parallel Object-Oriented Operating System". In Gul Agha, Peter Wegner, and Akinori Yonezawa, editors, Research Directions in Concurrent Object-Oriented Programming. MIT Press, 1993. [5] Douglis and Ousterhout. Transparent process migration: Design alternatives and the Sprite implementation. Software - Practice and Experience, 21(8):757{786, August 1991. [6] V. S. Sunderam. PVM: A framework for parallel distributed computing. Concurrency, Practice and Experience, 2(4):315{340, [12] 1990. [7] Bjarne Stroustrup. The C++ Programming Language. Addison-Wesley, Reading, Massachusetts, 1986. [8] L. Peter Deutsch. Design Reuse and Frameworks in the Smalltalk-80 Programming System. In Ted J. Biggersta and Alan J. Perlis, editors, Software Reusability, volume II, pages 55{71. ACM Press, 1989. [9] N. Islam. Customized Message Passing and Scheduling for Parallel and Distributed Applications. PhD thesis, University of Illinois at Urbana-Champaign, 1994. [10] Ellard Roush. The Freeze Free Algorithm for Process Migration. PhD thesis, University of Illinois at Urbana-Champaign, Expected in May 1995. [11] A. Goscinski. Distributed Operating Systems: The Logical Design. Addison-Wesley, Sydney, Australia, 1991. [12] Zayas. The Use of Copy-on-Reference in a Process Migration System. PhD thesis, Carnegie Mellon University, 1987. also Technical Report CMU-CS-87-121. [13] Nayeem Islam, Robert E. McGrath, and Roy Campbell. \Parallel Distributed Application Performance and Message Passing: A case study". In Symposium on Experiences with Distributed and Multiprocessor Systems (SEDMS IV), San Diego, California, September 1993. [14] Nayeem Islam and Roy H. Campbell. \Design Considerations for Shared Memory Multiprocessor Message Systems". In IEEE Transactions on Parallel and Distributed Systems, November 1992. [15] Kai Li and Paul Hudak. Memory coherence in shared virtual memory systems. ACM Transactions on Computer Systems, 7(4):321{359, November 1989. [16] John B. Carter, John K. Bennet, and Willy Zwaenepoel. Techniques for reducing consistency-related communication in distributed shared memory systems. ACM Transactions on Computer Systems, 1995. To appear. [17] Honghui Lu, Sandhya Dawrkadas, Alan L. Cox, and Willy Zwaenepoel. Message passing versus distributed shared memory on networks of workstations. In Proceedings of Supercomputing '95, December 1995. To appear. 14
[18] Divyakant Agrawal, Manhoi Choy, Hong Va Lenong, and Ambuj K. Singh. Mixed consistency: A model for paralle programming. In Proceedings of the ACM Symposium on Principles of Distributed Computing, pages 101{110, 1994. [19] B. N. Bershad and M. J. Zekauskas. Midway: Shared memory parallel programming with entry consistency for distributed memory multiprocessors. Technical Report CMU-CS-91-170, Carnegie Mellon University, September 1991. [20] Matthew Zekauskas, Wayne A. Sawdon, and Brian N. Bershad. Software write detection for a distributed shared memory. In Operating Systems Design and Implementation (OSDI), 1994. [21] Kirk L. Johnson, M. Frans Kaashoek, and Deborah A. Wallach. Crl: High-performance all-software distributed shared memory. In Proceedings of the ACM Symposium on Operating System Principles, 1995. [22] Ranjit John and Mustaque Ahamad. Evaluation of causal distributed shared memory for data-race-free programs. Technical Report GIT-CC-94/34, Georgia Institute of Technology, 1994. [23] Aamod Sane and Roy H. Campbell. Object-oriented state machines: Subclassing, composition, delegation and genericity. In Proceedings of the Conference on Object-Oriented Programming Systems, Languages and Applications (OOPSLA'95), pages 17{32, October 1995. [24] Aamod Sane and Roy H. Campbell. Compiling knowledge based programs. In Proceedings of the ACM Symposium on Principles of Distributed Computing 1995, pages 268{268, August 1995. [25] Aamod Sane. Synthesizing process interaction protocols. Technical report, Department of Computer Science, University of Illinois at Urbana-Champaign, November 1995. Submitted for publication. [26] Aamod Sane and Roy Campbell. Coordinated memory: A distributed memory model and its implementation on a gigabit network. Technical report, Department of Computer Science, University of Illinois at Urbana-Champaign, October 1995. [27] F. X. Nursalim Hadi. Checkpointing in Distributed Virtual Memory by Using Local Virtual Memory. PhD thesis, Department of Computer Science, University of Illinois at Urbana-Champaign, 1995. [28] Aamod Sane and Roy Campbell. Resource exchanger: A behavioral pattern for low overhead concurrent resource management. In Pattern Languages of Program Design. Addison-Wesley Publishing Company, Reading, Massachusetts, 1996. (To appear). [29] Aamod Sane and Roy Campbell. Detachable inspector/removable cout: A structural pattern for designing transparent layered services. In Pattern Languages of Program Design. Addison-Wesley Publishing Company, Reading, Massachusetts, 1996. (To appear). [30] Krueger K., Loftesness D., Vahdat A, and Anderson T. Tools for the Development of ApplicationSpeci c Virtual Memory Management. In OOPSLA, pages 48{64, 1993. [31] Mohlale Se ka and Roy H. Campbell. An Open Visual Model For Object-Oriented Operating Systems. In Fourth International Workshop on Object Orientation in Operating Systems, Lund, Sweden, August 1995. [32] D. Zernik, M. Snir, and D. Malki. Using Visualization Tools to Understand Concurrency. IEEE Software, pages 87{92, May 1992. [33] Roy H. Campbell and See-Mong Tan. Choices: An Object-Oriented Multimedia Operating System. In Fifth Workshop on Hot Topics in Operating Systems, Orcas Island, Washington, May 1995. IEEE Computer Society. [34] See-Mong Tan, David Raila, and Roy H. Campbell. An Object-Oriented Nano-Kernel for Operating System Hardware Support. In Fourth International Workshop on Object-Orientation in Operating Systems, Lund, Sweden, August 1995. IEEE Computer Society. 15
[35] Willy S. Liao, David M. Putzolu, and Roy H. Campbell. Building a Secure, Location Transparent Object Invocation System. In Fourth International Workshop on Object-Orientation in Operating Systems, Lund, Sweden, August 1995. IEEE Computer Society. [36] David Cheriton. The V Distributed System. Communications of the ACM, pages 314{334, 1988. [37] B. N. Bershad, S. Savage, P. Pardyak, E. G. Sirer, M. E. Fiuczynski, D. Becker, bers C.Cha, and S. Eggers. Extensibility, Safety and Performance in the SPIN Operating System. In Proceedings of the 15th Symposium on Operating System Principles, December 1995. [38] Yasuhiko Yokote. The Apertos Re ecive Operating System: The Concept and Its Implementation. In
Proceedings of the 1992 International Conference on Object-Oriented Programming, Systems, Languages, and Applications, October 1992. [39] Calton Pu, Henry Massalin, and John Ioannidis. The Synthesis kernel. Computing Systems, 1(1):12{32,
1988. [40] Calton Pu, Tito Autrey, Andrew Black, Charles Consel, Crispin Cowan, Jon Inouye, Lakshmi Kethana, Jonathan Walpole, and Ke Zhang. Optimistic Incremental Specialization: Streamlining a Commercial Operating System. In 15th ACM Symposium on Operating Systems Principles (SOSP'95), Copper Mountain, Colorado, December 1995. [41] Yongcheng Li, See-Mong Tan, Mohaleh Se ka, Roy H. Campbell, and Willy S. Liao. Dynamic Customization in the Choices Operating System. Submitted to Re ection '96 Conference. [42] J. Gosling and H. McGilton. The Java Language Environment: A White Paper. Sun Microsystems Computer Company, 2550 Garcia Avenue, Mountain View, California 94043. http://www.sun.com, May 1995.
16