environment) form a dedicated unit which is commonly called an embedded ..... cal in UK and a small aircraft auxiliary power unit turbine at Dassault Aviation in.
Towards Embedded Qualitative Simulation | A Specialized Computer Architecture for QSim Marco Platzner, Bernhard Rinner, and Reinhold Weiss Institute for Technical Informatics Technical University Graz Steyrergasse 17/4 A-8010 Graz, Austria Abstract
Qualitative simulation is a key inference technique of model-based reasoning and is successfully demonstrated in areas like monitoring, fault-diagnosis and design. For industrial applications, embedded qualitative simulation is required, i.e., the qualitative simulator is coupled with the physical process by a set of sensors and actuators. Often the qualitative simulator must additionally satisfy real-time constraints. To advance embedded applications of the qualitative simulator QSim, we have developed a special-purpose computer architecture. The design of this computer architecture is based on parallelization and SW to HW migration. The main features of this design are (i) high-performance, (ii) scalability and (iii) increased portability to embedded platforms. The prototype implementation and the experimental results demonstrate the suitability of our approach for embedded qualitative simulation.
keywords: qualitative simulator QSim real-time AI parallel processing SW/HW migration
This work was partially supported by the Austrian National Science Foundation Fonds zur
Forderung der wissenschaftlichen Forschung under grant number P10411-MAT.
1
1 Introduction After more then a decade of research, the AI technique qualitative simulation is now on the brink of being applied in real-world problems. The focus of interest is shifting from pure research-oriented issues to more application-oriented ones. Qualitative simulation is concerned with deriving the behavior of a physical (dynamic) system given only weak and incomplete information about it. In qualitative Simulation, physical systems are modeled on a higher level of abstraction than in other simulation paradigms, like in continuous simulation. In this simulation paradigm, the physical system is modeled on a mathematical description in form of dierential equations. Qualitative simulation relies on a further abstraction of these dierential equations | the so-called qualitative dierential equations (QDEs). Qualitative simulation requires neither a complete structural description
of the physical system nor a fully speci ed initial state. The major strength of qualitative simulation is the prediction of all physically possible behaviors derivable from this incomplete knowledge. Qualitative simulation plays an essential role in the research area qualitative reasoning and is, furthermore, mostly used as the key inference technique in modelbased reasoning. The goal here is to automate tasks that engineers, technicians
and scientists perform when understanding, designing, explaining, monitoring and diagnosing physical systems. A lot of research projects demonstrate the applicability of qualitative simulation in these areas. Some of these research systems have already led to the development of commercial products while others have been ap2
plied to real-world problems. (see sidebar \Industrial Applications of Qualitative Simulation").
1.1 Application Categories In a typical industrial application, the qualitative simulator is coupled with a physical process by a set of sensors and actuators. The computer system (including the qualitative simulator among other tools) and the physical process (the so-called environment) form a dedicated unit which is commonly called an embedded system. Based on the degree of this coupling, two application categories can be classi ed:
O-line Applications
In o-line applications, there is only a loose coupling between the computer system and the environment. Data is transfered o-line between these systems, e.g., via les or via a user interface. The actual performance of the qualitative simulator is not vital for the functionality of the application. Qualitative simulation is used as a tool which interacts with other tools, e.g., CAD systems. Inter-operability, adaptability and portability are the important implementation issues. Typical examples for o-line applications of qualitative simulation are design veri cation and failure mode eect analysis (FMEA) [13].
On-line Applications
In on-line applications, there is a tight coupling between the computer system and the environment. The computer system must be reactive , which means that it has to compute its output data when input data is derived from the 3
environment. Moreover, almost all on-line applications of qualitative simulation require real-time behavior, where the timeliness of the computer system's results is vital for the functionality of the system. Here, performance and | even more importantly | predictability play crucial roles. It must be guaranteed that the computer system reacts to inputs from the environment within prede ned time windows. As real-time systems are also spatially tightly coupled with the physical process, their design is strongly in uenced by resource limitations. Often widely divergent criteria must be met, like low power consumption, small size, high performance and high reliability. Typical on-line applications of qualitative simulation are monitoring and fault diagnosis [2] [14]. Both categories will be of increasing importance in the future. On-line applications, however, are more demanding from the viewpoint of computer system design. To handle complex, real-world problems on-line, eorts are required in the design of both real-time qualitative simulators and specialized computer architectures.
1.2 Approaches for Real-Time Qualitative Simulation The combination of AI tasks with real-time behavior is a challenging task that is being faced more and more nowadays. Based on the taxonomy of Musliner et al. [11], three dierent approaches are identi ed: (i) embedding AI tasks into a realtime system, (ii) embedding real-time tasks into an AI system and (iii) cooperating real-time and AI tasks. In the last approach, neither the AI task nor parts of 4
them are forced to run in real-time; the AI tasks are responsible for planning and scheduling the real-time tasks. Therefore, this approach is not well suited for realtime qualitative simulation. The rst two approaches, however, lead directly to reasonable application scenarios:
Embedding Qualitative Simulation into a Real-Time System
Here, the qualitative simulator is forced to meet the deadlines. There are two dierent methods to achieve this. The rst method reduces the high execution time of qualitative simulation and its variance by: (i) constraining the qualitative simulation algorithm at the price of a decreased quality of its output, (ii) customizing the qualitative simulation algorithm to a certain problem class, i.e., incooperating domain knowledge to simplify qualitative simulation and (iii) supporting qualitative simulation by a specialized computer architecture, also often denoted as performance engineering [5]. Although none of these strategies makes an algorithm for qualitative simulation predictable in general, a lot of important real-world problems can still be solved. The second method to force qualitative simulation to meet deadlines is to design incremental or anytime algorithms[4][12]. The simulation task is made interruptible, and at any time a useful | rather than the optimal | reaction is provided.
Embedding Real-Time Tasks into Qualitative Simulation
Here, most parts of the qualitative simulator remain unchanged. However, some functions which must have predictable execution times are integrated into the system as high-priority tasks. An example scenario for this approach is a 5
monitoring and diagnosis system, where a real-time task monitors the system state and triggers an alarm shutdown, if critical values of some parameters are detected. Otherwise, the monitored system state is passed to the diagnosis part of the system.
1.3 Specialized Computer Architecture for the Qualitative Simulator QSim QSim [8] is the best-known qualitative simulator. During the past few years, QSim
has been widely studied, applied and extended both by the original developers and by researchers worldwide. The lack in runtime performance of current QSim implementations is basically caused for two reasons. First, QSim can generate a huge number of system behaviors during simulation. The number of behaviors depends strongly on the simulation model and the used domain-speci c QSim lter algorithms. Second, QSim is implemented in Lisp and executed on general-purpose computers. To advance embedded applications of QSim, especially on-line applications, we have developed a special-purpose computer architecture. The main objective of this development is to improve the runtime performance of QSim. This is achieved by parallelizing and mapping of some QSim functions onto a multiprocessor system and migrating other functions from software to hardware. Two important aspects for applying qualitative simulation in on-line systems are also addressed in this project. First, the computer architecture is scalable, i.e., the performance can be adapted to 6
the problem complexity. Second, the QSim functions are implemented in 'C', which is far more appropriate for porting the qualitative simulator to dierent types of embedded processor platforms than the original Lisp implementation. The development of a special-purpose computer architecture for QSim is a challenging task. Prior to the design process of a special-purpose computer architecture, a profound analysis of the algorithm is required. However, QSimand qualitative simulation in general, have not been analyzed yet for computational complexity. Neither a formal analysis of the computational complexity nor empirical studies of the behavior of qualitative simulation have been reported [1]. A rough analysis of QSim shows that this algorithm oers only a low to medium data parallelism and
that the control- ow shows a very irregular, i.e., input data dependent, behavior. Additionally, a part of the QSim algorithm is NP{complete.
2 QSim Algorithm In QSim, models are described as qualitative dierential equations (QDEs) or equivalently as constraint{networks, which consist of variables and constraints . Variables represent system parameters, e.g., velocity or temperature. The values of variables are expressed by two parts, a qualitative magnitude (qmag) and a qualitative direction (qdir). Constraints describe relations between system parameters. QSim uses several types of constraints which represent arithmetic relations (e.g.,
ADD{, MULT{, D/DT{constraints) and functional dependencies (e.g., M {, M?{ +
constraints) between variables. 7
initial state processing
agenda={}
yes
no generate possible values QSim kernel constraint-filter
form-all-states
global filters
Figure 1: Flow chart of QSim. QSim predicts all possible behaviors of a physical system. A behavior is a
sequence of states that represent one possible temporal evolution of the system. The generation of behaviors basically requires the solution of two dierent problems:
Generation of Initial States
Given a QDE and partial information about the initial state, determine all complete, consistent qualitative states.
Generation of Successor States
Given a QDE and a complete qualitative state, determine its immediate successor states. The ow chart of the basic QSim algorithm is shown in Figure 1. Initial state 8
QSim kernel
CCF tuple-filter (D/DT, M+, MADD, MULT)
constraint-filter
Waltz-filter form-all-states
Figure 2: Hierarchical structure of the QSim functions. processing generates all complete, consistent initial states and stores them in an
agenda for further processing. The immediate successors of each state are generated by three successive steps. First, the possible values of all variables are determined for the next time-step (generate possible values). Second, the QSim kernel generates all candidates for successor states. Finally, global lters are applied to test each candidate state for consistency with the other states of the behavior, e.g., to detect cycles. States surviving all these checks are stored in the agenda. The generation of successor states is continued until all states in the agenda have been processed or a resource limit has been exceeded. The QSim kernel consists of two consecutive functions, constraint- lter and formall-states, which are further hierarchically structured as shown in Figure 2. The constraint- lter calls several tuple- lter functions. For each constraint in the QDE, one tuple- lter function is required. These functions are further divided into socalled constraint-check-functions (CCFs). CCFs are primitive functions, and an 9
individual CCF exists for each constraint type.
3 QSim Computer Architecture The development of the QSim computer architecture is described in the following sections. First, design considerations concerning parallelization and mapping of kernel functions onto the QSim kernel multiprocessor are presented. Then the design of the CCF coprocessors is discussed. Finally, a prototype implementation and experimental results are presented.
3.1 Design of the QSim Kernel Multiprocessor 3.1.1 Parallel Constraint-Filter The constraint- lter uses the possible values of all variables as input data and returns only combinations of these possible values that do not violate local consistency conditions. This ltering is performed in two consecutive steps. In the rst step, the tuple- lter function discards all tuples which are not consistent for an individual constraint. Thus, for each constraint in the QDE one tuple- lter is required. In the second step, the Waltz- lter function discards tuples which violate conditions between adjacent constraints, i.e., constraints which share a variable. The data- ow graph in Figure 3 reveals that all tuple- lter functions are independent of each other and can be executed in parallel. Therefore, the maximum degree of parallelism is determined by the number of constraints C of the QDE. All 10
possible qualitative values
t-f1 tuples1
t-f2 tuples2
...
t-fC
tuplesC
W-f
consistent tuples
Figure 3: Data- ow graph of the constraint lter for a QDE with C constraints. t-fi denotes the tuple- lter function for the constraint i, W-f the Waltz- lter. tuplesi are the tuples consistent with the constraint i. functions of the constraint- lter can be logically grouped in a master/slave structure of tasks. For each constraint of the QDE one slave task exists which executes the tuple- lter function for that constraint. The master task is responsible for the transmission of the input data to all tuple- lter tasks, the reception of the tuple- lters' results and the execution of the Waltz- lter. The logical structure forms a star with the master as the central element. However, in a star structure the master becomes a bottleneck as the number of slaves increases. This in turn limits the scalability of the computer architecture. To achieve a scalable and high-performance architecture, the tasks of the logical structure are connected in a wide tree topology. We model a wide tree as an n-ary tree with n greater than 2. The tree topology ensures scalability. Further, the wide tree is
motivated by a class of microprocessors that have several fast communication links 11
on-chip. By the use of the wide tree model, we can easily map the logical structure onto the multiprocessor and, at the same time, exploit its communication facilities. The root node of the tree corresponds to the master task; all other nodes correspond to the slaves of the logical structure.
3.1.2 Parallel Form-All-States The kernel function form-all-states solves a constraint-satisfaction problem CSP [10] by a backtracking algorithm. To nd all solutions of the CSP, a big search space | given by the tuples generated by the constraint- lter | has to be processed by a depth- rst search. Contrary to the constraint- lter, there is no obvious parallelization. For a parallel implementation of form-all-states, the CSP must be partitioned arti cially. In our QSim architecture, a parallel-agent-based (PAB) strategy [9] is used for the parallelization of the CSP. The basic idea of PAB is to partition the overall search-space into smaller independent subspaces which can be solved by any sequential CSP algorithm in parallel. The degree of parallelism, i.e., the number of independent subspaces, depends on input-data and can therefore not be determined in advance. The logical structure of the parallel form-all-states algorithm is similar to the logical structure of the constraint- lter. Due to the PAB strategy, a master/slave structure is also derived. The master task is responsible for the generation and transmission of subproblems to the slave tasks and for merging the partial results to the overall result. The slave tasks execute a sequential CSP algorithm to nd all 12
qualitative values (triple)
internal memory
SF1 partial result
SF2 partial result
SF3 partial result
SF4 result
Figure 4: Partitioned MULT-CCF. The MULT-CCF is partitioned into four subfunctions, SF1 to SF4. solutions in the subspaces.
3.2 Design of CCF Coprocessors The CCFs are executed on specialized coprocessors. The coprocessor design is described here for one of the most complex CCFs, the MULT-CCF. An analysis of the MULT-CCF reveals that this CCF can be partitioned into four subfunctions. The partitioned MULT-CCF is shown in Figure 4. The subfunctions SF1 to SF3 check whether the rules for qualitative multiplication hold with respect to the given input values. These subfunctions require triples of qualitative values as input data and return boolean results. SF3 forms an iteration over additionally required qualitative values stored in the internal memory. SF4 performs a logical AND operation on the partial results of SF1 to SF3. 13
The specialized coprocessor implementing the functionality of the MULT-CCF is designed at the gate- and register-level to obtain maximum performance. The main features of the design are (i) exploitation of parallelism , i.e., the parallel execution of SF1, SF2 and the iterations of SF3, (ii) use of optimized data types , i.e., the number of bits and the coding scheme of the input values and the values stored in the coprocessor memory, and (iii) use of customized memory architectures , i.e., the internal organization and the access mode of the coprocessor memory. The coprocessor design contains functional blocks for SF1 to SF4, the internal memory, an I/O controller and a function controller [3]. The I/O controller establishes communication to a host processor via two separate communication channels which enable simultaneous input and output operations. The function controller decodes the instructions and controls the operation of all other functional blocks of the coprocessor. Three instructions are de ned for the MULT-CCF coprocessor. Two instructions update the internal memory; the other instruction actually executes the MULT-CCF.
3.3 Prototype Implementation and Experimental Results A prototype of the overall heterogeneous multiprocessor architecture is shown in Figure 5. The digital signal processor TMS320C40 was chosen as the processing element because of its high I/O performance and its 6 independent communication channels. Thus, wide tree structures of up to 5 children per node can be built. Software is developed in 'ANSI-C' under the distributed real-time operating system 14
front-end
processing element TMS320C40
processing element TMS320C40
processing element TMS320C40
...
processing element TMS320C40
CCF coprocessor XC4013
processing element TMS320C40
CCF coprocessor XC4013
Figure 5: Prototype of the overall architecture for the QSim computer architecture. The processing elements (DSP TMS320C40) are connected in a wide tree structure with up to 5 children per node. Some processing elements are equipped with CCFcoprocessors (Xilinx XC4013).
15
3 RCS QSEA M1
S(n)
2.5 2 1.5 1 1
2 n
3
Figure 6: Speedup Scf (n) of the parallel implementation of the constraint- lter for the models RCS, QSEA, and M1 using n = 1 : : : 3 slave processors. Virtuoso, which supports a portable and exible software design. The specialized CCF coprocessors are implemented on eld programmable gate arrays (FPGAs) of type Xilinx XC4013.
3.3.1
QSim
Kernel Multiprocessor
The experimental evaluation of the QSim kernel multiprocessor is based on a comparison of the execution times of the sequential implementation, tseq , and the parallel implementation using n processing elements, tpar (n). With these execution times, the speedup S (n) = tpartseqn is determined. The QSim kernel multiprocessor is eval( )
uated using two dierent sets of input data. These sets are derived from the QSim models RCS (48 constraints) [7] and QSEA (21 constraints) [8]. The parallel implementation of the constraint- lter is additionally evaluated by the data set M1. This data set is arti cially generated and models a QDE with 30 MULT constraints. Figure 6 presents the speedups of the parallel implementation of the constraint lter, Scf , using 1, 2 and 3 slave processing elements. The best speedup is achieved 16
S(n)
6 5 4 3 2 1 0
RCS QSEA
1
2
3
4 n
5
6
7
Figure 7: Speedup Sfas(n) of the parallel implementation of form-all-states for the models RCS and QSEA using n = 1 : : : 7 slave processors. with input data set M1 because M1 has the longest execution times of the individual tuple- lter tasks. Figure 7 presents the speedups of the parallel implementation of form-all-states, Sfas, by using up to 7 slave processors. Parallel execution of the RCS model reveals a superlinear speedup using one and two slave processor(s). This occurs because the partitioning algorithm discards many inconsistent subspaces of the partitioned CSP, and the total execution time of the remaining consistent subspaces is smaller than the execution time of the unpartitioned CSP.
3.3.2 CCF Coprocessors The experimental evaluation of the CCF coprocessors is based on a comparison of the execution times of the software CCF, tsw , with the execution times of the pair, host and coprocessor, thw , for the CCF. From these execution times, the speedup of the coprocessor, Sccf =
tsw thw ,
is calculated. This speedup also respects the re-
quired communication between the host and the coprocessor. The measured execution times and the calculated speedups are subdivided into several execution cases 17
Sccf 35 30,55 30
27,14 24,03
25 19,67 20
15
12,89
10 6,88 5
1
2
3
4
5
6
case
Figure 8: Speedup Sccf of the MULT-CCF coprocessor dependent on 6 dierent execution cases.
18
according to the subfunction which causes termination of the CCF. For the shortcircuit-evaluation of the software MULT-CCF, the execution order SF1, SF2, SF3 is assumed. Six cases are dierentiated where case 1 denotes only execution of SF1, and case 6 denotes execution of SF1, SF2 and four iterations of SF3. These six cases re ect the most likely situations. In Figure 8, the speedup of the MULT-CCF coprocessor, Sccf , is presented based on these six execution cases.
3.3.3 Overall Speedup The individual speedups resulting from parallelization and coprocessor support, Scf ; Sfas and Sccf , allow the determination of the overall speedup of the QSim com-
puter architecture, Stot. The runtime of the kernel is given by the sum of the runtimes of the two kernel functions, constraint- lter and form-all-states: tseq = tcf + tfas. The runtime ratios of the two kernel functions are given by =
tcf tseq
and =
tfas tseq .
With these runtime ratios, the overall speedup is de ned as Stot =
1 + : Scf Sccf Sfas
Given this speedup formula, it is clear that a high overall speedup can only be achieved if both kernel functions are accelerated appropriately with respect to their runtime ratios. Depending on the solution of the two basic problems of QSim, i.e., the generation of initial states and the generation of successor states (compare Section 2), two dierent classes of runtime ratios can be identi ed. Our empirical runtime analysis reveals that the generation of initial states results in a runtime ratio of approximately = 0:1 and = 0:9. Therefore, a high speedup for form19
all-states is important. On the other hand, generation of successor states results in a runtime ratio of approximately = 0:75 and = 0:25. A high speedup for the constraint- lter is essential and, therefore, the in uence of the coprocessors on the overall speedup increases considerably. With the performance results of our prototype implementation, an overall speedup, Stot, from 5 to 20 is achieved.
4 Conclusion The conception of our specialized, high-performance computer architecture for the qualitative simulator QSim relies on the exploitation of parallelism and on software to hardware migration. The prototype implementation shows promising performance results for complex simulation models. For the application in embedded systems with varying requirements for computational power, the scalability of the computer architecture as well as the availability of sequential and parallel versions of the QSim kernel in 'ANSI-C' are important. O-line applications will bene t from the high performance of QSim in that more simulation runs can be done in less time. This will make qualitative simulation a wider-accepted technique for engineering tasks. For on-line applications, our work is mainly a performance engineering approach to support the use of QSim in real-time systems. These advances in performance are the rst steps towards future embedded qualitative simulation.
20
Sidebar:
Industrial Applications of Qualitative Simulation Qualitative reasoning has been a very active research area since the early 1980s, and qualitative simulation has always played an important role in this area. Yet very little research in this area has led to industrial or commercial applications. However, in the last few years more attention has been drawn to application oriented research. This can be seen by the introduction of special workshops and review papers [6][15]. Probably the most advanced industrial application of qualitative simulation in the area of monitoring and diagnosis is the recently completed ESPRIT III project Tiger [14]. In the Tiger project, a condition monitoring system for gas turbines has been developed. Reasoning is performed in parallel at three dierent levels to meet dierent reasoning and performance requirements. At the top level, qualitative simulation is used to predict the behaviors of the turbine at start up and in response to load changes. Application sites include a large industrial turbine at Exxon Chemical in UK and a small aircraft auxiliary power unit turbine at Dassault Aviation in France. Dressler [2] presents a further industrial application together with Siemens. Here, qualitative simulation is used for on-line diagnosis and monitoring of ballast tank systems on ships and oshore platforms. Research at the University of Wales in combination with Ford and Jaguar has led to the commercially available design analysis tool Flame [13]. It is important 21
that designs are analyzed for hazardous and safety-critical situations. Flame automatically generates a failure mode eect analysis (FMEA) of electrical subsystems in cars. FMEA involves the investigation and assessment of the eects of all possible failure modes on a system. Work in this area is also done at the Technical University of Munich, at Bosch and at Daimler.
References [1] Ernest Davis. An engaging exploration of QSIM and its extensions. IEEE Expert, 9(6):70{71, December 1994. book review.
[2] Oskar Dressler. On-Line Diagnosis and Monitoring of Dynamic Systems based on Qualitative Models and Dependency-recording Diagnosis Engines. In Proceedings of the 12th European Conference on Arti cial Intelligence, pages 481{
485, Budapest, August 1996. [3] Gerald Friedl, Marco Platzner, and Bernhard Rinner. A Special{Purpose Coprocessor for Qualitative Simulation. In Proceedings of the First International EURO-PAR Conference, pages 695{698, Stockholm, Sweden, August 1995.
[4] Alan Garvey and Victor Lesser. A Survey of Research in Deliberative Real-Time Arti cial Intelligence. Real-Time Systems, 6(3):317{347, 1994. [5] Thomas P. Hamilton. An Architecture for Real-Time Qualitative Reasoning. In Boi Faltings and Peter Struss, editors, Recent Advances in Qualitative Physics, chapter 18, pages 279{294. MIT Press Cambridge,Massachusetts, 1992. 22
[6] John Hunt, Peter Struss, and Jean-Paul Krivine, editors. 1st International Workshop on Model-based Systems and Qualitative Reasoning { Perspectives for Industrial Applications. Workshop at the 12th European Conference on
Arti cial Intelligence. Morgan Kaufmann, Budapest, Hungary, 1996. [7] Herbert Kay. A qualitative model of the space shuttle reaction control system. Technical Report AI92-188, Arti cial Intelligence Laboratory, University of Texas, September 1992. [8] Benjamin Kuipers. Qualitative Reasoning: Modeling and Simulation with Incomplete Knowledge. Arti cial Intelligence. MIT Press, 1994.
[9] Q.P. Luo, P.G. Hendry, and J.T. Buchanan. Strategies for Distributed Constraint Satisfaction Problems. In Proceedings 13th International DAI Workshop, Seattle, WA, 1994. DAI. [10] Alan K. Mackworth. Constraint Satisfaction. In Stuart C. Shapiro, editor, Encyclopedia of Arti cial Intelligence, volume 1, pages 285{293. John Wiley &
Sons, Inc., 1992. [11] David J. Musliner et al. The Challanges of Real-Time AI. IEEE Computer, 28(1):58{66, January 1995. [12] Michael Pittarelli editor. Special Section on Anytime Algorithms and Deliberation Scheduling. SIGART Bulletin, 7(2), April 1996.
23
[13] D. R. Pugh and N. A. Snooke. Dynamic Analysis of Qualitative Circuits for Failure Mode and Eect Analysis. In Proceedings of the Annual Reliability and Maintainability Symposium, Las Vegas, 1996. IEEE.
[14] Esprit Project 6862: Real-Time Situation Assesment of Dynamic, Hard to Measure Systems, 1995. D510.36 Final Application Report {FEP. [15] Louise Trave-Massuyes and Robert Milne. Application oriented qualitative reasoning. The Knowledge Engineering Review, 10(2):181{204, 1995.
24