Supporting Intelligent Real-Time Control: Dynamic Reaction on the Maruti Operating System Robert C. Kohout, James A. Hendler, David J. Musliner, and Ashok K. Agrawala Department of Computer Science The University of Maryland College Park, MD 20742 fkohout,hendler,musliner,
[email protected]
ABSTRACT AI Planning researchers have addressed the need for timely response to unanticipated and/or unpredictable events in dynamic environments by developing \reactive" systems, which are characterized generally by their environmentally-driven response characteristics. The Dynamic Reaction (DR) Model oers a paradigm of reactive program development that addresses many of the requirements of such systems. In this paper, we show how the Maruti hard real-time operating system supports the development of DR systems, and thereby provides the basis for the development of intelligent reactive systems with guaranteed performance characteristics, suitable for mission-critical applications.
1 Introduction
Using computers to control complex real-world applications such as aircraft guidance systems and automatic medical monitoring systems combines the need to respond quickly to changes in the environment with the need to make intelligent choices about courses of action. In such missioncritical systems, the inability to respond correctly and quickly enough to environmental changes (threats) can result in catastrophe. A fundamental problem in the design of intelligent mission-critical control systems is the con icting requirement for both logical and temporal correctness. Classical, search-based AI programs tend to produce logically correct problem solutions, within the narrow limits of a xed world model. Unfortunately, the search-space of such systems grows very quickly, to the point where the time required to nd a solution in even moderately-sized problem spaces is prohibitive. Moreover, the classical AI planning paradigm requires a variety of \closed world assumptions," including the requirement for a complete, deterministic, and accurate world model, that make it wholly unsuited to realistic, dynamic environments. Submitted to 1994 Real-Time Systems Symposium
1
As a result, many AI researchers have concluded that the search-based planning paradigm is ill-suited to the problem of controlling action in dynamic worlds. Instead, they have aimed to achieve intelligent behavior in such environments via stimulus-driven, \reactive" system designs. While reactive architectures have not obviated the need for search-based planners, they are capable of robust behavior in dynamic and/or unpredictable environments. Such systems have several advantages over classical planners: they do not require strong, predictive models of the environment, they do not depend upon the often untenable assumption of a relatively static environment where the reactor is the only agent of change in the world, and most importantly, they are capable of making timely responses in a rapidly changing environment. Most research into reactive system design suers from at least one of two serious shortcomings. Eorts to pre-compute and rapidly retrieve the results of classical planners (e.g., [5, 12]), retain the need for strong, causal models of the problem domain, which are dicult to engineer for complex, real-world environments. This diculty has led to work (e.g.,[1, 2, 4, 11]) which abandons the deductive correctness of classical AI for a more pragmatic, engineering-oriented approach. However, these \purely reactive" systems do not consider the problem of explicitly reasoning about (and guaranteeing) the timeliness of their response. The only work in intelligent reaction that does attempt to ensure timeliness (see [8, 9]) falls into the classical paradigm: it requires a robust formal model of the domain, which can be dicult, if not impossible, to derive. Since missing a deadline in mission-critical systems can result in catastrophe, it is imperative that proposed solutions are known to have correct temporal behavior before they are deployed. Correctness cannot be established by testing (see [14]). To produce reactive systems that can be proven to meet deadlines, we must have explicit mechanisms for controlling their timing. In this paper, we advocate the use of a real-time operating system to support the engineering of intelligent, mission-critical systems. We show how the Dynamic Reaction (DR) model [11] of reactive system design can be used to solve fundamental problems in the design of highly dynamic, guaranteed response system, by using features of the Maruti [7, 10] hard real-time operating system. This paper is organized into four additional sections. In Section 2, we introduce the DR model of reactive system design. In Section 3, we brie y describe Maruti and show how it directly supports the DR model. Section 4 introduces a problem domain designed to embody many of the fundamental problems inherent in the development of intelligent mission-critical applications. We then give a high-level description of Dodger, a DR system that was developed to solve these problems, and which runs on Maruti. We illustrate the straightforward mapping between DR concepts and Maruti constructs, using examples from Dodger. Section 5 summarizes this work, and discusses open issues for future research. 2
2 Dynamic Reaction The concept of Dynamic Reaction was introduced by Sanborn and Hendler in [11]. DR was designed to solve problems in dynamic worlds by separating the need to respond appropriately to frequent and potentially threatening change in the world from the rest of the planning process. Sanborn and Hendler distinguished between longer-term goals of achievement (e.g.,\get something to eat" or \get across the street"), which can be usefully determined by classical planning, and shorter-term goals of avoidance (e.g., \don't get hit by a car") that are typically associated with maintaining safety in a rapidly changing world. In order to accomplish these short-term goals, they proposed the use of a set of periodic processes, referred to as \monitors", that perform local computations in a bounded, and typically very short, amount of time. While each individual monitor is concerned with only a small part of the environment, they are collectively designed to perform the following tasks: 1. maintain an accurate prediction of the near-term future, 2. notice discrepancies that may invalidate this prediction, 3. react quickly based on perceived discrepancies, 4. consistently attempt to satisfy goals of achievement, despite setbacks. The DR model is founded on the concept of monitors: a set of them can be engineered to react to unexpected events, while a conventional search-based planner1 can be used to established long and intermediate-term goals. Sanborn and Hendler showed that it was possible to structure a set of monitors so that the global behavior of the system was \intelligent", in the sense that it was possible to pursue long-term goals while avoiding short-term, dynamic hazards in the environment.
3 Maruti
Dynamic Reaction proposes a solution to the problem of achieving intelligent behavior in changing worlds by using monitors to directly address the dynamism in the environment. In order to use DR in mission-critical applications, the performance of the individual monitors must be guaranteed: they must be scheduled so that they will always have the resources necessary to respond in a timely fashion. Maruti provides these capabilities. DR does not commit to the use of any particular methodology for longer-term planning. Integrating planning and reaction is an important part of this project, but we have thus far focused upon the development of the base-level reactive components, and upon showing that they are capable of maintaining safety conditions in a highly dynamic world. 1
3
The Maruti operating system [7, 10] is designed to support hard real-time applications on a variety of distributed systems while providing fault tolerant operation. Maruti supports guaranteedservice scheduling, in which jobs that are accepted by the system are guaranteed to meet the constraints of the computation requests.2 The fundamental processing unit of the Maruti system is known as an object, which consists of two main parts: a control element (or joint) and a set of service access points (SAP's), which are entry points for the services oered by the object. SAP's communicate via one-way message links. Given an initial SAP, known as the root, an application is depicted as a directed computation graph, where the vertices are services and the arcs represent the precedence between services. Thus, children in the graph are services requested by parents. A computation graph (and hence an application), is uniquely identi ed by its root. A job is a set of applications with their associated periodicity that is submitted to the operating system as a functional unit. Since each SAP has an associated worst-case runtime, and each joint maintains timing information about the resource requirements of its associated object, the operating system is able to verify the schedulability of a job prior to execution. If Maruti accepts a job, it guarantees that each individual SAP will be able to execute within the precedence and timing constraints of the computation graph in which it is contained. When a job is accepted, all of the SAP's in all of the computation graphs of the job are entered into a data structure known as a calendar, which is used at execution time to ensure that each SAP nishes before its deadline. The synergy between the DR model and Maruti rests on the following observations:
DR monitors must run periodically, in strictly bounded time.
Maruti supports the scheduling of periodic processes, and requires time bounds for real-time processes.
DR stresses several monitors each performing local computation.
Maruti supports distributed computing and one-way message passing between processes.
This close correspondence between the DR model and the Maruti software paradigm allows a straightforward implementation of monitors, as we shall describe below. This allows us to leave the problems of feasibility analysis, scheduling, and execution to Maruti, and focus upon engineering a There have been two implementations of Maruti. The rst proof-of-concept implementation (described in [7]) was used to develop the Dodger system discussed below. Dodger has since been ported to the new implementation of Maruti (described in [10]), and while the mapping between the two versions of Maruti is straightforward, we use the old terminology in this paper. 2
4
reactive control system that could function intelligently in a highly dynamic and complex problem domain.
4 Dodger : An Implementation of a Dynamic Reaction System on Maruti We developed the Dodger program to explore fundamental issues in the development of reactive systems using the DR model and Maruti. The problem domain was designed to embody many of the problems which are intrinsic to mission-critical systems. Dodger combines a high level of dynamism and frequent potential catastrophic failures with the need to perform periodic assessments of the state-of-aairs and make rapid decisions based upon that assessment.
Figure 1: Copter Domain 5
4.1 The Dodger Domain The Dodger domain is pictured in Figure 1. The circle represents the situated agent that is under control of the planning program. The short line segments represent moving \cosmic rays", which are potentially fatal if they hit the agent. In this simulation, these rays y along straight-line paths with xed velocities. The small dollar sign ($) represents a \goal". The agent is charged with a fairly simple mission: pick up $-goals, while dodging rays, and take the $-goals back to the small, semi-circular \base" at the bottom of the screen. Rays appear at random positions and orientations on the screen edges, and always y to the opposite edge. As each $-goal is picked up by the agent, a new one is generated at a random position that is guaranteed to be some minimum distance away from the edge of the screen, so that the Dodger is always aorded some minimum time to recognize the appearance of a new ray and react to it. The Dodger starts at the \base" at the lower center of the screen, and proceeds to pick up one or more $-goals before returning them to the base for credit. When leaving the base, the agent rst moves to a predesignated \turning point" a xed distance above the base, and then begins to head for the $-goal. When returning to the base, the Dodger rst moves to this same point, and then proceeds to the base. This is intended to represent a small, two-step \plan" in the traditional sense. The lower left- and right-hand corners of the screen contain areas where the mouse can be placed in order to override the actions being performed by the agent, and either force the agent to go back to the base or to pursue the next $-goal. As we will discuss below, this simple capability can be used as the basis for a hierarchical system of control, such as those advocated by Brooks [2] and Spector [13]. In the current implementation, there are times when the Dodger control program projects a collision with a ray that it will not be able to avoid. For this reason, the Dodger is given a small number of \shields" that it can use for short bursts of time to protect itself from a collision. Shields are replenished upon returning to the base. Thus, the control program needs to take the number of remaining shields into account when deciding whether to pursue goals or to return to the base.
4.2 Implementation Issues
The architecture of the Dodger system is shown in Figure 2. Each of the ovals represents a periodic process which must be scheduled in the Maruti calendar. The arrows represent the ow of messages between processes. In the following subsections, we shall describe the details of the Dodger architecture, how it is implemented on Maruti, and how it addresses the fundamental problems of reactive system design. 6
ACTION
Command Monitor
Control Goal Monitor
Logic
Shield Position
Safety Monitor
Monitor
Input Monitor
Monitor World Model
Ray Ray
Ray
Ray
Monitor Monitor
Monitor
Monitor
Figure 2: The Dodger Architecture 4.2.1 Monitors
Most of the indicated processes in Figure 2 are monitors. As we have stated above, the implementation of these is conceptually straightforward in Maruti. For example, in the Dodger system, one monitor is responsible for tracking the availability of a shields resource. Omitting code responsible for maintaining the display (a major complication in a hard real-time environment), this is written concisely in the Maruti programming language3 as: SERVICE Shield_Monitor( dummy ) Dummy_Msg *dummy ; { EXEC_TIME 20; if (shields_are_low()) { SEND("control.Command_Monitor", sizeof(Goal_Msg), &low ) ; } EXIT() ;
In this implementation, the programming language is C, augmented by a number of Maruti-speci c constructs which are pre-processed by a Maruti compiler and passed on to a C compiler. 3
7
}
The dummy message argument merely re ects the fact that this monitor receives no messages from other services. The EXEC TIME statement indicates that the worst-case execution time of this service is 20 milliseconds, which was the minimum schedulable unit in the prototype implementation of Maruti. The SEND statement implements a unidirectional message link between SAP's, shown on Figure 2 by the arrows between ovals.
4.2.2 High Levels of Dynamism The DR model addresses the problem of dynamism by isolating the dynamic properties of the world, and employing monitors to detect environmental changes and/or make corresponding adjustments in the behavior of the system. In Dodger, we have a separate monitor for each \cosmic ray" in the environment. Each of these monitors is responsible for periodically reporting the position of each ray to the safety monitor. By using Maruti, we were able to get complete control over the periodicity of these services. In addition, the operating system and DR monitor organization allows us to determine an upper bound on the number of rays that the system can deal with. That is, if it is possible for Maruti to schedule N but not N + 1 ray monitors, we know that the Dodger agent cannot be guaranteed to remain safe if it ever has to track more than N rays.
4.2.3 Safety Conditions The DR model distinguishes between goals of achievement and the immediate goals of avoidance associated with safety conditions. In Dodger, the system is willing to inde nitely postpone actions intended to further goals of achievement in favor of those which guarantee safety. A separate monitor is used to project possible collisions with rays. If such a collision is detected, a message is sent to the controller that will cause it to adjust its behavior in order to avoid the collision. The appropriate avoidance behavior must be determined quickly. If the control system cannot nd a suitable means of getting to safety, the system will use one of its shields.
4.2.4 World Model Figure 2 indicates that the Safety Monitor contains a world model. This is not the same as the highly predictive, causal model of the environment that is employed by many traditional AI planning systems. Rather, it is the database of facts that are used by the Safety Monitor to project potential collisions with rays. Since the Safety Monitor must be able to make this determination quickly, it does not make an active attempt to verify that its model is up-to-date. Instead, it assumes that its model is correct, and bases its decision on this assumption. The Maruti object that contains the Safety Monitor also contains services for processing messages from the Position 8
Monitor (which knows the position of the Dodger), and the Ray Monitors, which are responsible for tracking individual rays. These services are separately scheduled processes that are responsible for keeping the data in the world model current. In this way, we have been able to separate the problems of using a model of the world and keeping it current. In many domains, it is possible that the data in the world model does not need to be updated as frequently as it needs to be used. By separating the services that use the data from those that monitor its accuracy, the DR model allows these functions to be performed at dierent rates. As we have seen in the case of determining the limit to the number of Ray Monitors the system can support, our approach is also conducive to the determination of resource limitations at design time. Furthermore, by allowing separate processes to be responsible for the currency of dierent data items, we allow for the possibility of solving resource limitations by adding more processors. Maruti provides transparent support for multiprocessor computations, so no change would be required to distribute various monitor processed to dierent computers.
4.2.5 Multiple, Con icting Goals of Achievement The Goal Monitor is responsible for reporting to the controller when the Dodger has picked up a goal. The controller then decides whether to go back to the base or to pursue the next goal that appears on the screen. In order to make this decision quickly, we have used decision theory [3] to allow the controller to make this determination. The system knows where it is, where the next goal is and where the base is. It uses this information, along with the number of shields it has remaining, to rapidly decide what it should do next. The fact that the prototype implementation of Maruti did not support non-real-time processes and/or interruptible computations led us to seek a means of determining our next goal in a small and strictly bounded amount of time. Nonetheless, the need to make decisions rapidly is more general. If the reactive component must always wait for relatively slow, non-real-time processes to reason about con icting goals and constraints, the system will spend a good deal of its time simply avoiding catastrophes while waiting to be directed towards another goal. Decision theory provides the desired means for rapid, rational decision-making that can be used by low-level systems while waiting for (possible) direction from higher levels.
4.2.6 Command Override It is often desirable to override the normal operation of a reactive system. For example, whenever the Dodger system runs out of shields, it is exposed to the possibility of being hit by a ray. Therefore, if the number of shields is low, we want the Dodger to return to the base and replenish its supply of shields. The Shield Monitor is responsible for noticing that the supply of shields is low. Rather than report this directly to the control system, it reports this fact to an intermediate process, which 9
we have called the Command Monitor. This is done for two reasons: rst, the Shield Monitor is quite simple. It does not know if it has already reported the low shield condition, or if the Dodger is already headed towards the base, etc. A separate process has been made responsible for making this higher-level determination. Secondly, we have also implemented the ability for a human controller to use the mouse to tell the Dodger to go get a goal, or to return to base, regardless of what its own logic tells it to do. The Command Monitor is responsible for making sense out of all of these \outside in uences", and for deciding whether or not a message telling the controller to change its goal is warranted. In more complex systems, we envision a hierarchical system of controllers, much like that described in [13]. At each level of the hierarchy, a separate control system would be responsible for achieving a goal that is provided to it by a higher level, while (if needed) using a set of monitors to ensure that safety conditions are maintained. The mechanism we have employed for command override in Dodger provides a basis for this type of hierarchical control.
4.2.7 Control Logic All of the monitors described so far are intended to suggest appropriate actions. These multiple, potentially con icting suggestions are combined by the control logic module. This module is comprised of a nite state machine (FSM), augmented by a \current goal" variable, a goal stack, and two procedures: one to determine what to do when the Safety Monitor determines a collision is pending, and one to decide whether to pursue another $-goal or return to the base after it has just picked up one $-goal. Transitions in the FSM are always triggered by messages coming from the monitors. In our implementation, we have modeled this on Maruti by establishing separate services to process messages from each monitor and update the state of the FSM. In addition, the service that responds to messages from the Safety Monitor may call a function to nd a safe place to move towards. If one is found, a new, temporary goal is established as the \current goal", and the previous goal is pushed onto the goal stack. If no \safe harbor" can be found, the Dodger is directed to use up one of its shields. The service that responds to messages from the Goal Monitor will invoke a subroutine to establish the next goal in those cases where the Dodger has just picked up a $-goal.
4.2.8 Interfacing with a Conventional Planner The controller is designed to pursue a single goal while maintaining safety as described above. Although the entire Dodger system runs in hard real-time on Maruti, we have investigated using unbounded computations in conjunction with a set of real-time DR monitors elsewhere [6]. The controller has been designed to be isolated from the generation of its goals. Thus it is certainly 10
possible to use a classical planning system to generate intermediate and long-term goals in nonreal-time, and then provide these, one-at-a-time, to the reactive DR controller.
5 Conclusion
The traditional methods of AI are ill-suited to the needs of systems intended to operate in complex, unpredictable and rapidly changing environments. The Dynamic Reaction model of behavior generation was developed as an alternative to the classical, search-based paradigm. This paper discusses the DR-based design of Dodger, and its implementation on the Maruti hard real-time operating system. We have shown the close relationship between the monitor-based DR approach to the development of intelligent reactive systems and the Maruti programming paradigm. We have also described a number of features which will have to be a part of such systems, and have shown how they can be implemented on Maruti and in adherence to the DR model. In particular, we have shown:
The DR model can be used in highly dynamic domains to address frequently occurring and potentially catastrophic threats in the environment.
Using DR on Maruti provides system designers the ability to control and reason about the timing properties of their systems. It provides developers the ability to prove temporal properties of reactive systems.
Isolating the use of a model of a dynamic environment from the problem of keeping the model current is conducive to the early detection of resource limitations.
The DR model uses monitors to check for the potential violation of safety conditions and override normal system operation if a threat is projected. Maruti guarantees that these monitors will run at xed, pre-determined periods.
Decision theory provides a convenient means of reasoning about multiple, con icting goals of achievement in bounded time, which is often desirable in highly complex and dynamic domains.
The mechanism of providing goals of achievement one-at-a-time to a reactive system, which pursues that goal while maintaining goals of avoidance, can provide the basis for a hierarchical control system, as well as a means of integrating reactive capabilities with conventional planning.
Maruti provides the basis for proofs of temporal correctness, but we are also interested in developing proofs of logical, behavioral correctness. We are currently exploring means of representing 11
reactive competences so that it is possible to reason about their use in isolation and in combination. This work is a part of a larger-scale investigation into ways of integrating reaction with conventional, non-real-time planning systems, so that they can be used in temporally constrained, mission-critical systems.
References
[1] P. E. Agre and D. Chapman, \Pengi: An Implementation of a Theory of Activity," in Proc. National Conf. on Arti cial Intelligence, pp. 268{272. Morgan Kaufmann, 1987. [2] R. A. Brooks, \A Robust Layered Control System for a Mobile Robot," IEEE Journal of Robotics and Automation, vol. RA-2, no. 1, pp. 14{22, March 1986. [3] J. Feldman and R. Sproull, \Decision Theory and Arti cial Intelligence II: the Hungry Monkey," Cognitive Science, vol. 1, no. 2, pp. 158{192, 1977. [4] R. J. Firby, \An Investigation into Reactive Planning in Complex Domains," in Proc. National Conf. on Arti cial Intelligence, pp. 202{206, 1987. [5] L. P. Kaelbling, \Goals as Parallel Program Speci cations," in Proc. National Conf. on Arti cial Intelligence, pp. 60{65, 1988. [6] R. Kohout, J. Hendler, A. Agrawala, and D. Musliner, \Grounding Dynamic Reaction on the Maruti Operating System," Technical Report CS-TR-3231, University of Maryland Department of Computer Science, April 1994. [7] D. Mosse, O. Gudmundsson, and A. K. Agrawala, \The Maruti System and its Implementation," IEEE TCOS Newsletter, vol. 5, no. 3, , September 1991. [8] D. J. Musliner, E. H. Durfee, and K. G. Shin, \CIRCA: A Cooperative Intelligent Real-Time Control Architecture," IEEE Trans. Systems, Man, and Cybernetics, vol. 23, no. 6, pp. 1561{ 1574, 1993. [9] D. J. Musliner, E. H. Durfee, and K. G. Shin, \World Modeling for the Dynamic Construction of Real-Time Control Plans," to appear in Arti cial Intelligence, 1994. [10] M. Saksena, J. da Silva, and A. Agrawala, \Design and Implementation of Maruti," Technical Report CS-TR-3181, University of Maryland Department of Computer Science, December 1993.
12
[11] J. C. Sanborn and J. A. Hendler, \A Model of Reaction for Planning in Dynamic Environments," Int'l Journal for Arti cial Intelligence in Engineering, vol. 3, no. 2, pp. 95{102, April 1988. [12] M. J. Schoppers, \Universal Plans for Reactive Robots in Unpredictable Environments," in Proc. Int'l Joint Conf. on Arti cial Intelligence, pp. 1039{1046, 1987. [13] L. Spector, Supervenience in Dynamic World Planning, PhD thesis, University of Maryland, College Park, MD, May 1992. Also available as CS-TR-2899 and UMIACS-TR-92-55. [14] J. A. Stankovic, \Misconceptions about Real-Time Computing: A Serious Problem for NextGeneration Systems," IEEE Computer, vol. 21, no. 10, pp. 10{19, October 1988.
13