High Performance Modelling and Simulation

0 downloads 0 Views 611KB Size Report
to support modelling and simulation of large-scale system with natural and arti cial ... The basic modelling formalism employed is that of discrete events for .... DEVS falls within the formalisms identi ed by Ho 6] for discrete event dynamical systems ..... construction of deadlock-free, synchronous, parallel DEVS simulation ...
1

High Performance Modelling and Simulation: Progress and Challenges 1

Bernard P. Zeigler, Yoonkeon Moon and Doohwan Kim AI and Simulation Group Department of Electrical and Computer Engineering University of Arizona, Tucson, AZ 85721 Email :: [email protected] George Ball School of Renewable Natural Resources University of Arizona Tucson, AZ 85721 [email protected]

Abstract Modelling large scale systems with natural and arti cial components requires storage of voluminous amounts of knowledge/information as well as computing speed for simulations to provide reliable answers in reasonable time. Computing technology is becoming powerful enough to support such high performance modelling and simulation. This paper starts with an overview of a project to develop a simulation environment to support modelling of large-scale systems with high levels of resolution. Based on this framework we point out the need for a million fold increase in today's desktop computing power. We then discuss design features of the high performance environment that have been shown to o er speedups of the scale required. We show how the DEVS (Discrete Event System Speci cation) formalism provides the ecient and e ective representation of both continuous and discrete processes in mixed arti cial/natural systems necessary to fully exploit available computational resources.

1 Introduction Simulation-based design and testing before deployment has become the preferred way of elding new systems in many areas. For example, simulation is to play a major role in the plans of the US Army in its restructuring for the information age. The complexity of behavior that modern systems can exhibit demands computing power far exceeding that of current workstation technology.

1 This research was supported by NSF HPCC Grand Challenge Application Group Grant ASC-9318169, Rome Labs Contract F30602-95-C-0230 and F30602-95-C-0250. It employed the CM-5 at NCSA under grant MCA94P02.

High Performance Modelling and Simulation: Progress and Challenges, Aug. 10, 1996

2

To address challenging computing problems using high-resolution, large scale representations of systems composed of natural and arti cial elements, high performance simulation-based design environments are characterized by two levels of intensive knowledge/information processing. At the decision-making level, searches are conducted through vast problem spaces of alternative design con gurations and associated model structures; at the execution level, simulations generate and evaluate complex candidate model behaviors, possibly interacting with human participants in real time. This paper provides an overview of a project that is developing a high performance environment to support modelling and simulation of large-scale system with natural and arti cial components at high levels of resolution. The basic modelling formalism employed is that of discrete events for representing both continuous and discrete processes. We will argue that, rather like the superiority of digital signal processing over older analog technology, discrete event representations have signi cant performance and conceptual advantages over continuous dynamic system formalisms. Figure 1 depicts simulation-based decision making, in terms of a layered system of functions. In this paradigm, decision makers, for example, forest managers, base their decisions on experiments with alternative strategies (e.g., for reducing the risk of wild res) where the best strategies (according to some criteria) are put into practice. For a variety of reasons, experiments on models are preferred to those carried out in reality. For realistic models (e.g., of forest re spread), such experiments can not be worked out analytically and require direct simulation. The design of our environment to support all these activities is based on the layered collection of services shown in Figure 1, where each layer uses the services of lower layers to implement its functionality. To provide generic robust search capability we employ Genetic Algorithms (GAs) as the searcher in the model space. The optimization layer employs the searcher to nd good or even optimal system designs (models). Experience with this environment has shown that only large numbers of interconnected processing nodes can provide 1) the memory to hold the enormous amounts of knowledge/information necessary to model complex systems, and 2) the simulation speed required to provide reliable answers in reasonable time. Currently one can marshal such large numbers of computing nodes dedicated to a single problem only in scalable, high performance platforms such as the Connection

High Performance Modelling and Simulation: Progress and Challenges, Aug. 10, 1996

3

decision making optimization modelization simulation

Figure 1: Layered Representation of Simulation-based Decision Making Machine, CM-5 or the IBM SP2 which contain up to 1024 processors. However, we will show that at least a million fold increase in either speed or numbers of nodes is needed for such systems to support optimization of large scale models. Unfortunately, the cost of such platforms is beyond the means of most potential users and there are only a small number accessible in national high performance computing centers. By contrast, the numbers and speeds of desktop computers (PCs and workstations) are escalating rapidly so that harnessing these resources might o er a solution. However the obstacles to networking large numbers of distributed computing resources are formidable. One survey result indicates that the largest network cluster contains 130 workstations connected with PVM over Ethernet, much less than the massively parallel computer platforms. One signi cant social barrier to dedicating large numbers of workstations to a single computation is distributed ownership which tends to discourage shared usage. Two technical barriers which we addressed in the design of our simulation environment are heterogeneity and portability. We discuss these issues later in the paper.

1.1 The Layered High Performance Environment As illustrated in Figure 2, the various processes are executed concurrently within a heterogeneous, distributed computing environment [1]. Each GA agent has access to a simulator for executing its experiments. Although the simulator is shown as a single entity, it too could be distributed among the processors. Generally an experiment consists of several trials testing how well a particular intelligent control (supervisory or management) agent functions in a prescribed problem environment. This environment is represented as a simulation model which is controlled/observed by the agent

High Performance Modelling and Simulation: Progress and Challenges, Aug. 10, 1996

GA agent

GA agent

simulator

intelligent control

experimental frame

4

GA agent

GA agent

simulator

simulator

intelligent control

experimental frame

...

intelligent control

simulator

intelligent control

experimental frame

experimental frame models at different levels of abstraction

models at different levels of abstraction models at different levels of abstraction

models at different levels of abstraction

Figure 2: Search Controlled High Performance Modelling and Simulation Environment. through an appropriate experimental frame. The model in each simulator may actually be one several related models at several levels of abstraction ranging from low to high resolution. The GA may initially search through the coarser space spanned by the most abstract model before going on to higher resolution searches. As an example, the family of model abstractions could be discrete event models of a watershed varying in resolution. The experimental frame may provide a storm track as input and it might observe the resulting ooding pattern. The e ectiveness of a set of pre- ood stage sensors as an early warning system might then be reported to a GA agent and manipulated by the distributed GA to search for improving the locations for placement of the sensors.

1.2 Needs and Sources for High Performance The demands of such an environment on any technology capable of supporting it are enormous. Realistic simulations of large models with decision-making components are time consuming. The GA searcher, although robust, is apt to require thousands of simulation evaluations to locate an optimal con guration. For example we have demonstrated a successful application of our simulation-based optimization environment to fuzzy control system synthesis [2], to 1024-node parameter search problems in optical interconnection network design [3] and to watershed modelling [4]. The number of iterations

High Performance Modelling and Simulation: Progress and Challenges, Aug. 10, 1996

5

required to identify the optimum in these cases is shown in Table 1. Optimization Run Iterations Needed Fuzzy Control System Synthesis 100,000 Watershed Modelling 1,000,000 Optical Interconnection Network Design 10,000,000 Table 1: Iteration Requirements for GA Search These optimizations were run on the CM-5, but for illustration purposes, suppose that they had been executed on a conventional workstation. Were each simulation to require 1 minute, 100,000 GA iterations might require a good part of person's lifetime. High performance is clearly needed to speed up such computations. In this paper, we will show speedups of the order of 1,000,000 are in fact attainable with the technology and methodology on the horizon. Table 2 shows where such performance improvements could come from in the simulation environment. We will provide evidence for a) up to 1,000 fold speedup gained by properly mapping continuous models into ecient DEVS approximations, b) up to 10,000 speedup with the application of parallel/distributed processing at each of the simulation and GA search levels. The upper bound of this estimate is based on the best performance achievable on an N processor system, where N is currently around a thousand (e.g. 1024 in the CM-5). The number of processors in a single platform will increase another order of magnitude with the construction of the 9,000 processor system announced by Intel. By multiplying the three speedup factors, the speedup possible is of the order of 108 to 1011. However, this ignores the connection between the GA and simulation levels. To get the maximum search speed up, we need 10,000 processors each with its own GA agent. To get the maximum speed out of each agent's simulation, we need each of the 10,000 processors to employ 10,000 processors. Thus we need a system having 108 processors at its disposal. Such massive parallelism is only in the research stage at present. Speedup Order Sources 100 { 1,000 DEVS Representation 1,000 { 10,000 Parallel/Distributed Simulation 1,000 { 10,000 Parallel/Distributed GA Search Table 2: Sources of Speedups in a High Performance Simulation Environment We will o er some evidence for the attainability of the individual speedups in Table 2 in the sequel. Then we will comment on the problem of compounding the individual speedups. First we

High Performance Modelling and Simulation: Progress and Challenges, Aug. 10, 1996

6

review the DEVS formalism which is the basis for the environment under construction. We discuss its use in ecient continuous process representation and in achieving high performance through parallel/distributed simulation. Finally we describe our distributed GA search architecture and its performance.

2 The DEVS Formalism: Basis for the Simulation Environment To discuss the performance advantages of discrete event model formulations we will need to review the modelling formalism, called DEVS, underlying the current high performance simulation environment.

2.1 Brief Review of the DEVS Formalism We now review the basic concepts of the DEVS formalism and its associated simulation methodology. In the conceptual framework underlying the DEVS formalism [5], the modelling and simulation enterprise concerns four basic objects:

 the real system, in existence or proposed, which is regarded as fundamentally a source of data  the model, which is a set of instructions for generating data comparable to that observable in the real system. The structure of the model is its set of instructions. The behavior of the model is the set of all possible data that can be generated by faithfully executing the model instructions.

 the simulator which exercises the model's instructions to actually generate its behavior.  experimental frames capture how the modeller's objectives impact model construction, experimentation and validation. Experimental frames are formulated as model objects in the same manner as the models of primary interest. In this way, model/experimental frame pairs form coupled model objects which can be simulated to observe model behavior of interest. The basic objects are related by two relations:

High Performance Modelling and Simulation: Progress and Challenges, Aug. 10, 1996

7

 the modelling relation, linking real system and model, de nes how well the model represents the system or entity being modelled. In general terms a model can be considered valid if the data generated by the model agrees with the data produced by the real system in an experimental frame of interest.

 The simulation relation, linking model and simulator, represents how faithfully the simulator is able to carry out the instructions of the model. The basic items of data produced by a system or model are time segments. These time segments are mappings from intervals de ned over a speci ed time base to values in the ranges of one or more variables. The variables can either be observed or measured. The structure of a model may be expressed in a mathematical language called a formalism. The discrete event formalism focuses on the changes of variable values and generates time segments that are piecewise constant. Thus an event is a change in a variable value which occurs instantaneously. In essence the formalism de nes how to generate new values for variables and the times the new values should take e ect. An important aspect of the formalism is that the time intervals between event occurrences are variable in contrast to discrete time where the time step is a xed number. Independence from a xed time step a ords important advantages for modelling and simulation. Multiprocess models contain many processes operating on di erent time scales. Such models are dicult to describe when a common time granule must be chosen on which to represent them all. Moreover, simulation is inherently inecient since the states of all processes must be updated in step with this smallest time increment { such rapid updating is wasteful when applied to the slower processes. In contrast, in a discrete event model every component has its own control over the time of its next internal event. Thus, components demand processing resources only to the extent dictated by their own intrinsic speeds or their responses to external events. DEVS falls within the formalisms identi ed by Ho [6] for discrete event dynamical systems (DEDS). Work on a mathematical foundation of discrete event dynamic modeling and simulation began in the 70s when DEVS was introduced as an abstract formalism for discrete event modeling. Because of its system theoretic basis, DEVS is a universal formalism for discrete event dynamical systems (DEDS). Indeed, DEVS is properly viewed as a short-hand to specify systems whose input,

High Performance Modelling and Simulation: Progress and Challenges, Aug. 10, 1996

8

state and output trajectories are piecewise constant. The step-like transitions in the trajectories are identi ed as discrete events. Discrete event models provide a natural framework to include discrete formalisms for intelligent systems such as neural nets, fuzzy logic, qualitative reasoning, and expert systems. However, traditional di erential equation models continue to be the basic paradigm for representing the physical environments in which intelligent agents operate. We have proposed that DEVS-based systems theory, incorporating discrete and continuous subformalisms, provides a sound, general framework within which to address modelling, simulation, design, and analysis issues for natural and arti cial systems [5]. The universality claims of the DEVS just cited are addressed by characterizing the class of dynamical systems which can be represented by DEVS models. It is known that any causal dynamical system which has piecewise constant input and output segments can be represented by DEVS. We call this class of systems DEVS-Representable. In particular, Di erential Equation Speci ed Systems (DESS) are often used to represent both the system under control and the controller, which, as a decision making component, has a natural DEVS representation. DEVS supports construction of new models by interconnecting already existing models as components. Such interconnection, called coupling, is speci ed in a well de ned manner embodied in the formalism of the coupled model [5]. Closure under coupling guarantees that coupling of class instances results in a system in the same class. The class of DEVS-representable dynamical systems is closed under coupling. Closure is an essential property since it justi es hierarchical, modular construction of both DEVS models and the (continuous or discrete) counterpart systems they represent.

2.2 Parallel DEVS The DEVS formalism, as revised to enable full exploitation of parallel execution is the basis for the DEVS-C++ high performance simulation environment [7]. A DEVS basic model is a structure:

M =< X; S; Y; int; ext; ; ta; > X : a set of input events.

High Performance Modelling and Simulation: Progress and Challenges, Aug. 10, 1996

9

S : a set of sequential states. Y : a set of output events. int : S ! S : internal transition function. ext : Q  X b ! S : external transition function, X b is a set of bags over elements in X ,  : S ! Y b : output function. ta : S ! R0+!1 : time advance function, where Q = f(s; e)js 2 S; 0 < e < ta(s)g, e is the elapsed time since last state transition. DEVS models are constructed in a hierarchical fashion by interconnecting components (which are DEVS models). The speci cation of interconnection, or coupling, is provided in the form of a coupled model. The structure of such a coupled model is given by:

DN =< X; Y; D; fMig; fIig; fZi;j g > X : a set of input events. Y : a set of output events. D: an index set for the components of the coupled model. For each i in D, Mi is a component DEVS model. For each i in D [ fself g, Ii is the set of in uencees of i. For each j in Ii , Zi;j is a function, the i-to-j output translation mapping. The structure is subject to the constraints that for each i in D,

Mi =< Xi ; Si; Yi ; inti ; exti ; ; tai > is a DEVS basic structure, Ii is a subset of D [ fself g, i is not in Ii,

High Performance Modelling and Simulation: Progress and Challenges, Aug. 10, 1996

10

Zself;j : Xself ! Xj , Zi;self : Yi ! Yself , Zi;j : Yi ! Xj . Here self refers to the coupled model itself and is a device for allowing speci cation of external input and external output couplings. More explicitly, Iself is the set of components that receive external input; also if self is in Ii , then component i's output appears as external output of the coupled model. The behavior of a coupled model is constructed from the behaviors of its components and the coupling speci cation. The resultant of a coupled model is the formal expression of such behavior. Closure of the formalism under coupling is demonstrated by constructing the resultant and showing it to be a well de ned DEVS. As already stated, such closure ensures that hierarchical construction is well de ned since a coupled model (as represented by its resultant) is a DEVS model that can be coupled with other components in a larger model.

3 Watershed: DEVS Modelling and Simulation Example An example of distributed watershed hydrology [8] will illustrate DEVS modelling and simulation. The complexity of watershed hydrology calls for powerful modelling methodologies able to handle spatial interaction over a heterogeneous landscape as well as temporal dynamics introduced by varying rainfall conditions. Geographic Information Systems (GIS) can provide the spatially referenced data necessary to represent topography, rainfall, and soil state distributions. Spatial dynamic models are needed to project such states forward in time. However conventional di erential equation formulations entail an enormous computational burden that greatly limits their applicability. By combining GIS, for state characterization, and DEVS, for dynamic state projection, we derive an approach that can achieve realism within feasible computational constraints, albeit in high performance environments. Figure 3 shows a typical watershed, which consists of several vertical layers, such as air, surface water, subsurface soil, ground water and bedrock. We divide it into many small cells and develop a conceptual hydrology model for each cell that can be readily mapped into a DEVS component

High Performance Modelling and Simulation: Progress and Challenges, Aug. 10, 1996 Evaportranspiration air air trees shrub grass

11

Rainfall

cell

surface soil soil soil water water water

Sub−surface

bedrock bedrock

Figure 3: Cellular Space Representation of a Watershed model. Then we de ne how the directions of water ow are coded in a grid space and how the varying in ux rates in the discretized landscape are linked to create a coherent total runo .

3.1 A Conceptual Hydrology Model for a Cell As shown in Figure 4, we conceptually represent a cell with three vertically connected reservoirs. The rainfall input (r(t)) is partially intercepted by vegetation cover and the rest of it, the e ective rainfall (re (t)), becomes the source of surface runo and the in ltration. The surface reservoir receives the inputs, the e ective rainfall (re ) and in ow (qi(t)) from the neighbor cells, and generates the outputs, the runo (qo(t)) to neighbor cells and in ltration(f (t)) to the subsurface reservoir. The underground reservoir works similar to the surface reservoir except for in ltration. We de ne the water depth on a cell, the rainfall excess (Rx(t)), and the runo (qo(t)) as follows:

Rx (t) =

Z t(r (t) + X qi (t) ? f (t) ? X qo (t))dt 0

e

i

i

i

i

t)Rx(t) qoi (t) = CSi (W i

where

qii(t): in ow from ith neighbor cell, qoi (t): runo to ith neighbor cell, C : a parameter characterizing the surface roughness at the cell's location, Si(t): slope to the ith neighbor cell, Wi: distance to the ith neighbor cell.

(1) (2)

High Performance Modelling and Simulation: Progress and Challenges, Aug. 10, 1996

12

r(t) E(t) Vegetation Cover re(t) qi(t)

qo(t)

surfacce reservoir

Rx(t)

f(t)

subsurface reservoir

W(t)

p(t) go(t)

gi(t)

underground reservoir

G(t)

Figure 4: Conceptual Hydrology Model for a Cell The slope Si (t) is computed by:

Si (t) = Rx(t) + h ?WRxi (t) ? hi ;

(3)

i

where h = altitude of the cell, Rxi = rainfall excess of ith neighbor cell, hi = altitude of ith neighbor cell (refer to Figure 5). As shown in Figure 5, each cell can have at most eight neighboring cells. However, we may consider only four connections by ignoring diagonal neighbor cells or even one connection (the cell−0 qi 0 qo0 qo 1 cell−1

qi

cell−A 1 qo

3

qi 3

Rx qi 2 qo 2

Rx

cell−2 h D2

cell−3

Figure 5: Connection of Cells

h2

2

High Performance Modelling and Simulation: Progress and Challenges, Aug. 10, 1996

13

direction of maximum runo ) depending on communication overhead costs and required accuracy of simulation results.2

3.2 DEVS Representation of Watershed Dynamics Recall that for a set of component models, a coupled model can be created by specifying how the input and output ports of the components will be connected to each other and to the input and output ports of the coupled model. Due to closure under coupling, the new coupled model is itself a modular model and thus can be used as a component in a yet larger, higher level model. For the simulation of water ow in a cellular space one can envision the placement of an atomic model at each cell location. Thus there is an array of spatially referenced models that form a coupled DEVS model that can be coupled to an experimental frame component. DEVS atomic models are stand-alone modular objects that contain state variables and parameters, internal transition, external transition, output and time advance functions. Two state variables are standard: phase and sigma. In the absence of external events the model remains in the current state, including phase state variable, for the time given by sigma at which time an internal transition occurs. If an external event occurs, the external transition immediately places the model into a new state, depending on the input event, the current state, the time that has elapsed in that state. The new state may have a new value for sigma thus scheduling the next internal transition. Note that DEVS recognizes the crucial role that the elapsed time plays in the external transition computation. This enables DEVS to faithfully represent the behavior of continuous systems through discrete events. The di erential equation system described in Section 3.1 can be formalized in an atomic model cell. One way, equivalent to the conventional numerical analysis approach, is to transform the

continuous system into a discrete time approximation. That is, we set sigma of the cell to some constant d. Each cell updates its states and generates outputs to neighbor cells at every xed time step. However, while straightforward, updating the states of every cell every time step imposes a heavy computational burden that may far more than necessary as suggested in Section 2.1. A more ecient and conceptually satisfying approach is to partition the state space into output equivalent blocks as shown in Figure 6. While its state trajectory remains in a block, each cell's

2 some experimentation indicates that there is not much di erence in ow patterns between 4 and 8 neighbors models but that the "gradient" (maximum ow) generates a distinctly di erent, and less realistic looking behavior.

High Performance Modelling and Simulation: Progress and Challenges, Aug. 10, 1996

14

State s 7D 6D 5D 4D 3D 2D 1D time t

Figure 6: State Space Partitioning for DEVS Representation. output remains constant. Internal events of each cell correspond to boundary crossings in the cell's state space. Given a state on a boundary, each cell predicts the state that will be reached on the next boundary crossing and the time (sigma) to reach it. Due to the heterogeneity of soil conditions, slope, and input ux conditions each cell lls at a di erent rate and thus takes a di erent time to reach its next quantization level. Note that while it is in a quantum block, the cell's output

uxes to its neighbors are constant and all input uxes are constant as well. Therefore, this enables us to compute when and where the next level crossing (increasing or decreasing) will occur. For example, assume that in the cell A the rainfall excess (water depth) is RxA (t) and rainfall excess rate (di erence between input and output) is rxA (t) at time t, then from Equation 2 and 3 the runo to the ith neighboring cell (qAi (t + t)) at time t + t is calculated by:

RxA (t + t) = RxA (t) + rxA (t)t Rxi (t + t) = Rxi (t) + rxi (t)t Si(t + t) = RxA (t + t) + hAW? Rxi (t + t) ? hi i CS ( t + t ) R ( t + t) ; i x A qAi (t + t) = W i

(4)

where Rxi (t) and rxi (t) are the ith neighboring cell's rainfall excess and rainfall excess rate at time

t, respectively. If the runo of a cell to its neighbor is nD (for some integer n and quantum size D) at time t, then we can compute the time advance t when the runo qAi becomes n(D ? 1) or n(D + 1) using Equation 4. Since there can be up to eight neighbor cells and each neighbor cell

High Performance Modelling and Simulation: Progress and Challenges, Aug. 10, 1996

15

can be in di erent states, the times to next level crossing for each neighbor can be di erent. In this case we have to take the minimum of these times as sigma. When a cell receives an external event from a neighbor, the message carries the latter's new input ux and new water level. The receiving cell's time and location of next boundary crossing may di er from that initially predicted. As indicated before, the DEVS formalism can handle this situation. Since elapsed time is known, the actual water level can be computed and sigma recalculated to represent reaching of the next quantum level at the new rate using Equation 4. When the quantized cell model implemented in this way was tested, the results were disappointing. There was little if any reduction in computation time compared with discrete time models. One source of overhead that the quantized model entails, not found in the discrete time model, is the extra calculation for sigma. However, analysis revealed that the main di erentiating overhead was due to the structure of the DEVS-C++ simulator (to be described in the next section). Consider the case where N cells, in di erent states, schedule their next events at di erent points on the real time axis. In this case, the DEVS-C++ simulator requires N iterations to execute all the events. This requires N times as many iterations than for a discrete time model in which the DEVS-C++ simulator updates all cells in one iteration. Note that the simulator can perform such a one iteration update since it implements the new parallel DEVS formalism where all the inputs and outputs of all simultaneously scheduled cells are properly managed. Thus the discrete time simulation is actually getting more of a boost than it would get in a conventional sequential cell scanning algorithm. This analysis immediately suggested a remedy: squeeze events dispersed over the time axis that are "close enough" to each other into groups, that are executed in one iteration. To accomplish this e ect, we quantize the time axis with a time granule of size d, in addition to quantizing the state space. The events between t and t + d are mapped to t + d by upward roundo , as shown in Figure 7. Note that in this quantized and granulized representation, the outputs may be delayed by d in the worst case, but the change of state still propagates in zero time. For example, when a cell receives a ux change input from a neighbor cell at time t, this causes its state to change at the same time. Then the cell sends its updated states to all neighbor cells also at time t without delay, setting its sigma to 0 for this purpose.

High Performance Modelling and Simulation: Progress and Challenges, Aug. 10, 1996

16

State s 7D 6D 5D 4D 3D 2D 1D time t

d

Figure 7: State Space Partitioning with Granulized Time Axis.

35 30 25 20 15 10 5

0 5 10 15 20 25 0

5

10

15

20

25

Figure 8: Elevation Map of Target Watershed.

3.3 Experimental Results Figure 8 shows the elevation map of a watershed arti cially created for experimentation. The target watershed is an array of 30  30 cells, each with dimension of 20m  20m (400m2). We applied a 50 mm/hour rainfall to the whole watershed for ten hours and observed the behavior of the model during the period followed by a subsequent dry period. We compared the quantized DEVS models, with di erent state quanta and time granules to the simple discrete time DEVS models with various time steps. The spatial evolution of ow was visually indistinguishable in all cases. However to get a quantitative estimate we measured the runo at the outlet (lowest point) over the 20 hour

High Performance Modelling and Simulation: Progress and Challenges, Aug. 10, 1996

17

period. Table 3 shows the runo values of each model in the steady state. Models Runo (m3 /hour) Di erence from DM(dt = 0.00001) DM(dt=0.01) 5,152.21 186.98 DM(dt=0.001) 4,982.91 17.68 DM(dt=0.0001) 4,966.86 1.63 DM(dt=0.00001) 4,965.23 0.00 QM(D=50.0, d=0.01) 4,960.00 -5.23 QM(D=1.0, d=0.01) 4,966.00 0.77 QM(D=0.1, d=0.01) 4,965.08 -0.15 Table 3: Runo of DEVS models in The Steady State. (DM (dt) is the discrete time model, QM (D; d) is the quantized DEVS model with quantum D and time granule, d, respectively.) We assume that the discrete time DEVS model with the smallest time step (0.00001 hour) is the most accurate one. The third column of Table 3 is the steady state di erence of each model from the 0.00001-step model baseline. The results show the quantized model with quantum 0.1 and time granule 0.01 is closer to the baseline than any other model. Models Iterations Execution Time(sec) D(dt=0.01) 4,000 867 D(dt=0.001) 40,000 8,353 D(dt=0.0001) 400,000 92,235 D(dt=0.00001) 4,000,000 726,557 Q(D=50, d=0.01) 35,100 375 Q(D=1.0, d=0.01) 24,200 523 Q(D=0.1, d=0.01) 29,100 604 Table 4: Execution Times of DEVS models. Table 4 shows the execution times of the models on a Sparc-1000 processor. Simulation with the quantized model (D=0.1 and d=0.01) is 153 times faster than that of the discrete time model with time step 0.0001 (recall it is also more accurate) and 1,203 times faster than that of the discrete time model with time step 0.00001 (from which it di ers very little). To analyze the source of this speedup we measured the number of events (sum of internal and external transitions) during simulation. Figure 9 the accumulated total number of events in log scale (base 10). The average number of events per iteration in the model QM(D=0.1,d=0.01) is approximately 6 % of that in DM(dt=0.00001). As shown in Table 4 and Figure 9, we greatly reduce both the number of iterations and events by quantization and granulization without losing accuracy in the steady state. By this reduction we achieve about 1,000 fold speedup in simulation execution time.

High Performance Modelling and Simulation: Progress and Challenges, Aug. 10, 1996

18

total number of events 10

DM(0.00001)

10

QM(0.1,0.01) 9

QM(50.0,0.01)

10

8

10

10

7

6

10

5

10

4

10

3

10

0.00

5.00

10.00

15.00

20.00

time(hour)

Figure 9: Accumulated Total Number of Events in log scale (base 10).

4 Inside the Simulation Layer This section reviews the design and implementation of DEVS-C++, the simulation layer of the high performance environment. We discuss some results that show its capability for speedup in parallel/distributed computing environments. Our motivation in building a general purpose and portable discrete-event simulation environment is to shield the modeler from having to deal with the underlying message passing technology while exploiting the speed and memory advantages of high performance, heterogeneous platforms. To accomplish this goal, we developed a Heterogeneous Container Class Library (HCCL) that can be implemented in both sequential and parallel/distributed platforms. Implemented in C++, these classes contrast with other concurrent portable C++ computing models which compose distributed data structures with parallel execution semantics. HCCL provides concurrency and a parallel computing paradigm at a higher level of abstraction encapsulating the details of the underlying message passing mechanisms. In particular, applied to discrete event simulation, HCCL abstractions support construction of deadlock-free, synchronous, parallel DEVS simulation environments. Our C++-implemented DEVS simulation environment manages simulation messages at any level of the model structure hierarchy without the need for special hardware [9]. This hardware independence enables it to be readily ported to a variety of distributed, multicomputer platforms.

High Performance Modelling and Simulation: Progress and Challenges, Aug. 10, 1996

19

Our approach exploits closure under coupling of the DEVS formalism to support automated model partitioning and mapping to a processor architecture.

4.1 Implementing the HCCL in Parallel/Distributed Computing Environments which−one? ask−all which? reduce

tell−all

container serial

CONTAINER parallel

Figure 10: Five Primitives of Containers Classes A container is an object used to allocate, and coordinate, the processing of objects transparently to the user. The main features of object-oriented programming such as the concept of class, inheritance, information hiding, data abstraction, and polymorphism are needed to implement such class functionality. Due to its support of object-oriented programming and its widespread acceptance, especially on scalable multiprocessors, the C++ language was used to implement the container classes. HCCL contains a collection of ensemble methods to treat the items in a container as a unit. The ve ensemble methods are enumerated:

 tell-all sends the same command to each object in a container.  ask-all sends the same query to each object and returns a container holding the responses (which are also objects).

 which? returns the subcontainer of all objects whose response to a boolean query is TRUE.  which-one? returns one of the objects in the container whose response to a boolean query is TRUE.

 reduce aggregates the responses of the objects in a container to a single object (e.g., taking the sum).

High Performance Modelling and Simulation: Progress and Challenges, Aug. 10, 1996

20

While ensemble methods are more parallel than sequential in nature, they have abstract speci cations that are independent of how one chooses to implement them. Thus, using the polymorphism properties of C++ we de ne two classes for each abstract container class; one (lower-case) implementing the ensemble methods in serial form, the other (upper-case), implementing them in parallel form (Figure 10). The serial implementations run on any architecture that has a C++ compiler. In particular, if the nodes of a parallel or distributed system support C++, then the serial containers will work on them. The distinction between serial container and parallel CONTAINER lies in the way message passing is accomplished. In a serial system, message passing is reduced eventually to subroutine invocation. However, the implementation of parallel CONTAINERs on distributed memory architectures involve physical message passing among objects residing on di erent nodes. Such message passing must be implemented by employing the communications primitives provided by the underlying system. For example, our CM-5 implementation employs CMMD (CM-5 message passing library). Likewise, a network of workstations can be linked together under the communication primitives supplied by PVM. For shared memory systems, system-speci c synchronization primitives are provided and explicit message passing is not required. Figure 11 sketches the construction of parallel CONTAINERs, The User Interface object sends ensemble method messages, in a string format, to a set of coordinators. These ensemble methods are employed to organize parallel objects and collect replies from containers by physical message passing. Thus, the User Interface provides a high level of control to coordinate all containers running on processing nodes. The coordinator and container are of the same parallel CONTAINER class, but they di er in the way each executes the commands or queries of the ensemble methods. For instance, multicast ensemble method messages are initiated by the coordinator, providing global information with its containers. The ordinary containers that are under the control of the coordinator are to receive and pass the messages. The coordinator collects the returned values from its containers and uses them as more meaningful information for the coordination of its containers. Container entities can be organized into hierarchies using containers and coordinators at any number of levels appropriate to the application. Such structures can be dynamically con gured during the application runtime as well as at initial setup using basic container operations (e.g., add

High Performance Modelling and Simulation: Progress and Challenges, Aug. 10, 1996 User Interface

Containers

21

Ensemble Methods Message Passing

Coordinator

Coordinator

Containers

Coordinator

Containers

Figure 11: Construction of Parallel Containers on CM-5: One-to-one Mapping Objects to Processing Nodes and remove). Both serial and parallel containers retain the data abstraction and modular construction critical to object-oriented programmability. In addition to the basic containers' behavior, the parallel CONTAINERs classes have to maintain concurrency and synchronization of the objects on multiple computing resources. Multicasting ensemble methods to container members can achieve a high degree of concurrency in a parallel architecture. Collection of responses from containers nodes that executed queries keeps them properly synchronized. Using ensemble methods a collection of processors can be organized and managed by parallel CONTAINER class objects. So a user application program can be layered over such CONTAINERs without concern for coordination of underlying computing resources. In the next section, we show how parallel CONTAINERs serve as the supporting layer of the parallel DEVS simulation environment.

4.2 Parallel DEVS implementation over parallel CONTAINER

High Performance Modelling and Simulation: Progress and Challenges, Aug. 10, 1996

22

DEVS Serial containers classes ensemble methods

paralell CONTAINER classes C++ Computing Platforms CM−5 PC UNIX

PVM Cluster

Figure 12: Implementation of DEVS using Containers Classes in C++ The parallel/distributed DEVS-C++ simulation environment is unique from other e orts to build a general and portable simulation environment. One of the key objectives in designing the simulation environment is portability of models across platforms at a high level of abstraction. One should not have to rewrite or modify a DEVS model to run on serial, parallel or distributed environments. Such portability enables a model to be developed and veri ed in a serial platform and then be quite readily ported to a parallel/distributed platform. DEVS implementation over container classes in an object-oriented form can achieve this portability goal. The DEVS formalism is expressed as a collection of objects and their interactions with the details of the implementation hidden within the objects. The user interacts with only those interfaces that manifest the DEVS constructs while being shielded from the ultimate execution environment. This approach is well illustrated in Figure 12. DEVS is implemented in terms of a collection of HCCL classes. In Figure 13, a two-dimensional grid of atomic components called cells is partitioned into blocks. Each block is assigned to a processor node. The closure property of DEVS guarantees that each block can itself be regarded as a DEVS model which can now be considered as a component model for a larger con guration. These components are then grouped together to form a new DEVS model (shown as \top block") which is equivalent in behavior to the original composition of cells. Note that blocks function as containers whether they contain cells or (in the case of the top block) lower level blocks. A cycle in a DEVS simulation is implemented in terms of ensemble methods. It is outlined as follows:

High Performance Modelling and Simulation: Progress and Challenges, Aug. 10, 1996

23

top block

cell(i,j)

block of cells

Figure 13: Hierarchical Construction of Block Models from Atomic Cell Models 1. Ask all components for their addresses: this stores the addresses in a container. Normally, this request need only be done once. However, one motivation for the containers-based formulation is that cells can be easily shifted from one node to another to balance computational load. An updating of components addresses is needed after each such load balancing operation. 2. Compute the next event time: this is done by a reduction which gets the minimum of components' times to next event. 3. Determine the imminent components: these are the components whose next event times are minimal. They are identi ed by using the which? ensemble method. 4. Tell all the imminent components to sort and distribute their output messages called the mail. 5. Tell all the imminent components to execute their internal transition functions. 6. Tell all components with incoming mail to execute their external transition functions with this mail as input. Figure 14 illustrates the mapping of top block and the nodal blocks on the processing nodes to implement the watershed models linked with GIS. The mapping corresponds to that of parallel CONTAINERs since the DEVS block models are embodied in the CONTAINER environment. The top block functions as a coordinator of the containers and the container on a processing node implements a DEVS block model. In other words, looking inside a node, we see it executing a DEVS

High Performance Modelling and Simulation: Progress and Challenges, Aug. 10, 1996

Evaportranspiration

Rainfall

air air trees shrub grass cell

surface soil soil soil water water water

Sub−surface

bedrock bedrock

Block

User Interface

Containers

Block n

Top Block

Block i Block0

Parallel CONTAINER Layer PN

PN

PN PN PN

PN

PN PN PN

PN

PN PN PN

PN

PN PN PN

PN

PN PN PN

PN

PN PN PN

PN PN PN

CM−5 Processing Nodes Layer

Figure 14: Mapping of Block Models to CM-5 Processing Nodes

24

High Performance Modelling and Simulation: Progress and Challenges, Aug. 10, 1996

25

simulation cycle in a block model based on serial containers. These serial blocks are coordinated at the next level by the DEVS simulation cycle using parallel CONTAINER ensemble methods. The parallel and synchronous nature of ensemble methods support deadlock free synchronous simulation. The User Interface block dictates each step of simulation to its container processors. In the current DEVS/CONTAINER architecture, the User Interface block contains the global time of next internal event. This is determined as the minimum of the nodal blocks' times of next internal event. The imminent nodal blocks will generate output messages and execute their internal transitions. A sequential nodal block's time of next internal event is also the minimum of its components (ultimately) cells' times of next internal event.

4.3 Performance Measurement on the CM-5 The approach to parallel/distributed simulation just described can be categorized as conservative synchronization. It is guaranteed not to violate time causality during simulation. Moreover it's simplicity costs signi cantly less overhead than other conservative and optimistic methods. The typical problem of conservative algorithms is low processor utilization. However, sequential DEVS simulation on each node can achieve high processor utilization rates. Simulation performance can be improved by employing the hierarchical structure of DEVS models to nd optimal mappings of model components to processor blocks. TOP−BLOCK

B

1

B

2

B

i

BN

Mail of messages

Figure 15: Mail Handling Scheme on CM-5: Localized sorting in source blocks and localized distribution The output and input messages of a DEVS model are transferred in a form analogous to mail distribution. The mail approach has the advantage of treating messages as containers with specialized methods for assembling and disassembling a collection of contents (port,value,and address structures). Exchange of mail among nodal blocks occurs at every cycle during a simulation run.

High Performance Modelling and Simulation: Progress and Challenges, Aug. 10, 1996

26

Since a major overhead in parallel simulation is physical communication, mail distribution algorithms must be designed to minimize communication time. Figure 15 illustrates the mail sorting and distribution scheme for step 4 of the DEVS simulation cycle. This scheme was established to be the best among other alternatives. Each source block sorts the output mail locally and directly sends it to processing nodes containing the destination addresses. By partitioning the cellular array

Time (seconds)

along columns, the neighboring nodes are easily computable and small in number (2). Sharing the communication network, all messages can be exchanged in parallel. The eciency of the scheme depends on the number of nodal blocks, the size of messages to be passed between blocks and the speed and topology of the communication network. 50

40

Serial Communication

30

20

10 parallel CONTAINER overhead Multicasting Scheme 0 100

200

300

400

500

Number of Processing Nodes

Figure 16: Multicasting Time of Ensemble Methods on CM-5 For scalability, the overhead of parallel CONTAINER operations must be kept low as the number of processors increases. Figure 16 shows test results on the CM-5 for 500 iterations of the DEVS simulation cycle for the watershed model. Initially, we used a serial technique whose communication time increases rapidly as the number of processors increases. This was later replaced by a multicasting technique for ensemble methods which shows that a nearly at dependence on number of processors can be obtained. This multicasting function sets up a tree-style asynchronous

High Performance Modelling and Simulation: Progress and Challenges, Aug. 10, 1996

27

communication pattern with CMMD blocking message passing functions. Using the multicasting technique, the major communication cost in a DEVS simulation cycle is due to the exchange of mail among container processors. This overhead is shown by the dotted curve in Figure 16. Based on the experimental results to be discussed, the overhead consumes less than 10% of the overall communication/computation cycle. Besides distributed simulation, direct access to GIS data bases is a major requirement for any computing environment capable of addressing complex ecological questions. This linkage was achieved by implementing a data handler object in the DEVS-C++ environment which processes data requests from models. The data object allows the model to be developed without knowledge of how the data is stored. The models can request information at a speci c geographic location and data is returned in the form that the model requires. 3

Time ( Seconds X 10 )

Number of Processing Nodes Used on the CM−5

31

63

127

255

509 x

163

x

59.8

x x

22

Sparc 2 8.1

x

SparcServer 1000 x

2.98 x

1.1 x

0.4

CM−5

x

0.15

x

0.054 961

3696

16129

65025

261121 3

Data Size ( X 10 cells)

Figure 17: Experimental Results for large data size on CM-5 Experiments on sequential machines consisted of running a single block containing di erent numbers of cells on each machine. The sequential execution results on the Sparc 2 processor and the SparcServer 1000 workstation show that the latter is twice as fast as the former. The CM-5 employs a Sparc 2 processor with 32 megabytes of memory. Initial results showed that the ratio

High Performance Modelling and Simulation: Progress and Challenges, Aug. 10, 1996

28

of execution times became greater as the number of cells increased due to the dynamic memory management, which resulted in a superlinear speedup. Subsequently, the cell model was re ned using an ecient memory management technique in the development process. A GIS data set, with 509  509 cells, was simulated on the CM-5 with one level of hierarchical construction. Experiments consist of simulating subsets of the full data set. The number of processing nodes used on the CM-5 increases in proportion to the data set. That is, for an array of N  N cells, N processors are employed, each with a row of N cells. Figure 17 shows the parallel execution times on the CM-5 along with the sequential execution times on the Sparc 2 and SparcServer 1000 machine. It is shown that simulation of about a quarter million cells on the CM-5 takes less than 20 minutes while it takes more than 64 hours on the Sparc 2 workstation. This translates to a speedup of 192.

5 The GA Searcher Layer A major advantage of high resolution models is that many of the parameter values needed to calibrate the model to its real system counterpart are obtained directly from available engineering or measurement-derived data. Still, a large simulation model typically has many more parameters that are unknown. These parameters need adjustment to tune the model to real world observed behavior or to optimize its performance to achieve a desired objective. Searching through such large parameter spaces for optimal, or even acceptable points is a daunting task, especially in multiple process models where each simulation run may require hours or days to complete. The more that automated optimizers can relieve human modellers of this search task, the faster will be the pace of advance in the modelling or design e ort. Therefore, optimization-based control of simulation is a key feature of our high performance environment.

5.1 Genetic Algorithms GAs are a class of stochastic operators that successively transform an initial population of individuals until a convergence criterion is met. Each individual represents a candidate system design and each design is evaluated using the underlying simulation layer to give some measure of its tness. On each iteration, a new population is formed by selecting tter individuals and transforming them

High Performance Modelling and Simulation: Progress and Challenges, Aug. 10, 1996

29

in hopes of obtaining ones even tter. Typically, crossover creates new children by combining parts from two parent individuals. Mutation creates new individuals by introducing a small random changes. After some number of generations the search converges and is successful if the best individual represents the global optimum. GAs often outperform classical optimization methods in search, optimization and learning. GAs have been applied to a wide range of search/optimization problems such as scheduling, fuzzy systems and neural networks design. Recently interest has increased in their potential application to modeling, simulation and design of complex real world systems. For a more complete explanation, please see the references [10, 11]. Adapted to the high performance simulation environment, GAs intelligently generate trial model candidates for simulation-based evaluation. Although schemes exist for parallelizing GAs [12], we designed a class called Distributed Asynchronous Genetic Algorithms (DAGAs) which is particularly suited to the demands of the high performance simulation environment. The following is an overview of the DAGA adapted to the optimization layer for the high performance simulation environment.

5.2 Distributed Asynchronous Genetic Algorithm (new individual, fitness)

GA_Controller

message

GA_agent

Gene−Pool 00101101010 10010101101 01101101010

(control processor)

Simulation Process

(processing nodes)

Figure 18: Asynchronous Genetic Algorithms The DAGA is an extension of the Asynchronous Genetic Algorithm (AGA) [13] The AGA maintains the genetic operations in one processing node (GA-controller) and distributes the evaluation (simulation) processes to many nodes (GA-agents) as shown in Figure 18. However, due to centralization of the GA-controller, the communication overhead and sequential genetic operations bottleneck processing as the number of processing nodes increases.

High Performance Modelling and Simulation: Progress and Challenges, Aug. 10, 1996

30

To achieve scalability, the DAGA, distributes both evaluation processes and genetic operations to processing nodes as shown in Figure 2. The operation of the DAGA is outlined as follows: Each GA Agent (DAGA agent) 1. generates its own individual randomly (population initialization step), 2. evaluates the generated individual's tness using simulation, 3. (as soon as this simulation is completed) randomly chooses a partner for mating among other GA agents and gets its individual, 4. applies crossover and mutation operators and selects the child which is more similar to its own individual, 5. evaluates the selected child by executing a simulation, 6. replaces the local parent with the child if the child's tness is better than that of both parents, 7. repeats steps 3-5 until a convergence criterion is met. As with the original asynchronous scheme, GA agents do not wait for a full generation to complete, which would severely reduce throughput when simulation times are widely dispersed. In earlier work we showed that search success is not detrimentally a ected by such asynchronous processing [2]. Moreover, in this scheme there is no central processing to bottleneck performance since all genetic operations are carried out by processors autonomously with at most minimal exchange with a randomly chosen partner. The bene cial e ect is shown in Table 5 which compares experimental results for the DAGA with the earlier scheme applied to the same problem of optimizing a fuzzy controller design for an inverted pendulum [2]. Note that while the original scheme's performance does not scale with increasing processors, the distributed version achieves quite close to a linear speedup dependence on number of processors. # of PEs 1 32 64 128 256 512 AGA 14,410 770(18) 512(28) 282(51) 254( 56) 259( 55) DAGA 12,940 567(22) 298(43) 142(91) 71(182) 36(359) Table 5: Execution Times of Asynchronous Genetic Algorithm and Distributed Asynchronous Genetic Algorithm on CM-5. (The unit is second and the numbers in parenthesis are the speedups.)

High Performance Modelling and Simulation: Progress and Challenges, Aug. 10, 1996

31

High Performance Simulation Environment

DAGA−Controlled

AGA−Controlled

DAGA−Simulators AGA Controller

AGA−Simulators DAGA−Simulator AGA−Simulator DAGA agent

AGA agent

Simulator

Simulator

Figure 19: GA Controlled High Performance Simulation Environment Figure 19 shows the system entity structure of the GA controlled high performance simulation Environment. This environment can be implemented in two ways depending on available resources and the computational complexity of simulation. The DAGA based implementation gives better performance than does the AGA based one as the number of computing resources (processing elements) becomes larger and the computational complexity of simulation becomes smaller as shown in Table 5. Otherwise the AGA based implementation is preferred since its implementation is simpler and more exible than that of the DAGA.

6 Conclusions This paper provided an overview of the work in progress to construct a high performance simulation environment capable of supporting the modelling, design and testing of large scale systems with natural and arti cial components at high levels of resolution. We have demonstrated the advantage of using the DEVS formalism to represent both continuous and discrete components in a large scale mixed natural/arti cial model. Discrete event models provide a natural framework to include intelligent components and we have illustrated how the DEVS formalism for discrete event modelling can include ecient high delity representations of continuous systems as well. For example, in the case of watershed behavior discussed in the paper, traditional approaches based on partial di erential equations decompose the watershed into "parking lots", each with builtin channel ow { without such coarse representation, such simulations would take months or years to complete. In contrast, our high resolution model allows channel ows to \emerge" from the underlying water dynamics and landscape topography.

High Performance Modelling and Simulation: Progress and Challenges, Aug. 10, 1996

32

Altitude(m) 350 300 250 0 10 20 30 40 X 40(m)

50 60 70 80 0

10

20

30

40

50

60

80

70

X 40(m)

Figure 20: Brown's Pond Elevation Map

runoff(m^3/hour) 7000 6000 5000 4000 3000 2000 1000 0 0 10 20 30 40 X 40(m)

50 60 70 80 0

10

20

30

40

50

60

70

80

X 40(m)

Figure 21: Brown's Pond Runo (m3/hour) after 2 simulated hours. (1 hour after end of 1 hour long rainfall)

High Performance Modelling and Simulation: Progress and Challenges, Aug. 10, 1996

33

Figure 20 is an elevation map of a real watershed, Brown's Pond. Figure 21 shows the distribution of runo at some time after a uniform rainfall. One can see that channels have formed that are clearly correlated with the topography. It should be noted that hybrid models containing both discrete and continuous components o er an attractive alternative and are undergoing intensive research. However, without mapping the continuous parts to DEVS they can not exploit the thousand fold speedups necessary to achieve feasible optimizations of complex systems. Working versions of the modelling, simulation and search layers of this environment have been completed. Coded in the object-oriented language, C++, this software runs on both serial and parallel computing platforms and is available for PCs, workstations and the supercomputers including the CM-5 and IBM SP2. The universality of the the DEVS modelling formalism, the portability of the C++ implementations, and the robustness of the GA search layer are intended to facilitate widespread use of the environment. 2,500 cells 10,000 cells 250,000 cells 1 simulation run on CM-5 with 1 node 12 days 53 days 7 years 50,000 runs on CM-5 with 1 node 15 centuries 70 centuries 3,350 centuries 50,000 runs on on CM-5 with 512 nodes 3 years 14 years 6.7 centuries Table 6: Estimated Execution Time for Watershed Simulation and Optimization with Discrete Time model on CM-5 (2 hour long rainfall event). 2,500 cells 10,000 cells 250,000 cells 1 simulation run on CM-5 with 1 node 15 minutes 1 hour 2 days 50,000 runs on CM-5 with 1 node 1.4 years 6 years 3 centuries 50,000 runs on CM-5 with 512 nodes 1 day 4.4 days 6 months Table 7: Estimated Execution Time for Watershed Simulation and Optimization with Quantized/Granulized DEVS model on CM-5 (2 hour long rainfall event). We have illustrated with actual experiments how each of the sources: DEVS representation, distributed/parallel simulation, and distributed GA-based search can individually achieve thousand fold speedups . However, when taken together, we may expect that the whole is less than the sum (or better product) of its parts. In particular, given a xed number of processors, N (e.g., 9000 on the planned Intel platform) we can hope to achieve at most N-fold speed up of the combined simulation and search layers (even though N-fold speedup is achievable in each layer when all processors are dedicated to it). Indeed to achieve the N-fold speed up will require an e ective and

High Performance Modelling and Simulation: Progress and Challenges, Aug. 10, 1996

34

dynamic, allocation of resources to the simulation and GA agents (In the experiments reported here, an acceptable level of load balancing was achieved manually). Such an "operating system" design has been proposed with encouraging simulation results [14]. With the DEVS representation and combined simulation/search layers each a ording a thousand fold speedup, taken together, the sources of high performance can achieve at least a million fold speed up over current workstation performance levels. As indicated, increases in performance of such scales will make possible some ambitious studies that are not feasible today. For example, Table 6 and 7 show execution times of watershed simulations for three di erent block sizes on the CM-5 with one and 512 nodes. The problem solved in each case is to nd the optimum assignment of surface roughness parameters to 10 regions needed to calibrate a model to a given runo pro le. This optimization takes approximately 50,000 iterations with GA search. The execution times of each run of the DEVS quantized/granulized model(D=0.1/d=0.01) for 3 cell sizes (50  50, 100  100 and 500  500) were measured. All other execution times are estimated from relationships discussed in the paper. As can be seen only the 50  50 and 100  100 GA optimizations using the DEVS quantized/granulized model are feasible with today's technology. These results show that some simulation-based studies, such as modelling large watersheds with high resolution, must still await the next generation of giga op platforms and/or the emergence of the tremendous distributed computing resources of the Internet. When this happens, there could be a tremendous increase in the reliability, safety and e ectiveness of tomorrow's complex systems such as ood, hurricane warning systems, forest re ghting robotic systems, or space-based reconnaissance systems.

References [1] G. Almasi and A. Gottieb, Highly Parallel Computing. Reading, MA: Addison-Wesley, 1989. [2] J. Kim, Y. Moon, and B. P. Zeigler, \Designing Fuzzy Neural Net Controllers using Genetic Algorithms," IEEE Control Magazine, vol. 15, no. 3, pp. 66{72, 1995. [3] A. Louri, H. Sung, Y. Moon, and B. P. Zeigler, \An Ecient Signal Distinction Scheme for Large-scale Free-space Optical Networks Using Genetic Algorithms," in Photonics in Switch-

High Performance Modelling and Simulation: Progress and Challenges, Aug. 10, 1996

35

ing: Topical Meeting, OSA, Salt Lake City, Utah, pp. 90{92, Mar. 12{17 1995.

[4] B. P. Zeigler, Y. Moon, V. L. Lopes, and J. Kim, \DEVS Approximation of In ltration Using Genetic Algorithm Optimization of a Fuzzy System," Journal of Mathematical and Computer Modeling, vol. 23, pp. 215{228, June 1996. [5] B. P. Zeigler, T. G. Kim, H. Praehofer, and H. S. Song, \DEVS Framework for Modelling, Simulation, Analysis, and Design of Hybrid Systems," in Leture Notes in CS (P. Antsaklis and A. Nerode, eds.), pp. 529{551, Springer-Verlag, 1996. [6] Y. C. Ho, \Special issue on discrete event dynamic systems," Proceedings of the IEEE, vol. 77, no. 1, 1989. [7] A. Chow and B. P. Zeigler, \Revised DEVS: a Parallel, Hierarchical, Modular Modeling Formalism," in Winter Simulation Conf., 1994. [8] T. Maxwell and R. Costanza, \Distributed Modular Spatial Ecosystem Modeling," International Journal in Computer Simulation, vol. 5, pp. 247{262, 1995. [9] Y. R. Kim, T. G. Kim, and K. H. Park, \Mapping Hierarchical, Modular Discrete Event Models in a Hypercube Multicomputer," Simulation Practice and Theory, vol. 2, May 1995. [10] Z. Miachalewicz, Genetic Algorithm + Data Structure = Evolution Programming. New York: Springer-Verlag, 1992. [11] T. Back and H. P. Schwefel, An Overview of Evolutionary Algorithms for Parameter Optimization, vol. 1, pp. 1{23. MIT Press, 1993. [12] V. S. Gordon and D. Whitley, \Serial and Parallel Genetic Algorithms as Function Optimization," in Proceeding of the 5th International Conference on Genetic Algorithms, UrbanaChampaign, IL, pp. 177{190, Univ. of Illinois Urbana-Champaign, July 1993. [13] J. Kim and B. P. Zeigler, \Hierarchical Distributed Genetic Algorithms in an Intelligent Machine Architecture," IEEE Expert Magazine, vol. 11, June 1996.

High Performance Modelling and Simulation: Progress and Challenges, Aug. 10, 1996

36

[14] J. Kim and B. P. Zeigler, \A Framework for Multi-resolution Optimization in a Parallel/Distributed Computing Environment: Simulation of Hierarchical GAs," Journal of Parallel and Distributed Computing, 1995. (accepted).