On-line Configuration of a Time Warp Parallel Discrete ... - CiteSeerX

This paper appeared in the Proceedings of the International Conference on Parallel Processing, ICPP-1998. c 1998, IEEE. Personal use of this material is permitted. However, permission to reprint or republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.

On-line Configuration of a Time Warp Parallel Discrete Event Simulator Radharamanan Radhakrishnan, Nael Abu-Ghazaleh, Malolan Chetlur and Philip A. Wilsey Computer Architecture Design Laboratory Dept. of ECECS, PO Box 210030, Cincinnati, OH 45221–0030 (513) 556-4779, [email protected]

Abstract

chosen, the simulator suffers from problems such as instability (due to excessive rollbacks), excessive (erroneous) optimistic computation, high memory usage [23] and high communication overheads caused by excessive rollbacks [6]. Thus, careful choice of the configuration of the simulator is imperative for efficient execution. The choice of an efficient configuration is complicated by the highly dynamic and unpredictable nature of the Time Warp model. In addition, the simulation may pass through different phases where the optimal configuration changes. Several researches have recognized this dynamic nature, and suggested adaptive techniques for optimizing specific aspects of the simulator [7, 14, 20, 29]. The use of adaptive techniques in a distributed application is difficult because changes that are affected locally have secondary system-wide effects. This is especially true for the Time Warp model because the progress of the simulation is nondeterministic. This paper investigates the use of on-line configuration in Time Warp simulators. It develops a general feedback control framework [1] for implementing on-line configuration, and illustrates its success in Time Warp simulators by using it to control three different facets of the simulator operation. The remainder of the paper is organized as follows. Section 2 describes Parallel Discrete Event Simulation (PDES), and the Time Warp model. Section 3 presents the control model used for on-line configuration. Section 4 introduces periodic check-pointing, and develops the control system used to configure the checkpoint period. Section 5 presents message cancellation algorithms and develops the control system used to dynamically switch among them. Section 6 describe the dynamic message aggregation optimization of the communication module of Time Warp, and develops the control model for optimizing the size of the aggregation window. The control models are analyzed, and the performance results are shown in Section 8. Finally, Section 9 contains some concluding remarks.

In Time Warp simulations, the overheads associated with rollbacks, state-saving and the communication induced by rollbacks are the chief contributors to the cost of the simulation; thus, these aspects of the simulation have been primary targets for optimizations. Unfortunately, the behavior of the Time Warp simulation is highly dynamic and greatly influenced by the application being simulated. Thus, the suggested optimizations are only effective for certain intervals of the simulation. This paper argues that the performance of Time Warp simulators benefits from a dynamic on-line decision process that selects and configures the sub-algorithms implementing the different aspects of the simulator to best match the current behavior of the simulation. In particular, we study control strategies to dynamically: (i) adjust the checkpointing (or state-saving) interval (ii) select the cancellation strategy (lazy or aggressive), and (iii) determine the policy for aggregating the application messages (an optimization that significantly improves the performance in message passing environments). The strategies have been implemented in the WARPED Time Warp simulation kernel and the performance obtained via the dynamically controlled optimizations is shown to surpass that of their best performing static counterparts.

1 Introduction Time Warp is an optimistic synchronization model that has been used extensively for parallel discrete event simulation [10]. It offers several advantages over conservative synchronization approaches [19] and has the potential to outperform them [9]. Unfortunately, Time Warp simulators have yet to realize this potential consistently. The performance of the simulator is affected by the choice of sub-algorithms implementing the different aspects of the simulation kernel, and the internal parameter settings for these sub-algorithms (we call these algorithms and their set of internal parameters, the configuration of the simulator). If an unsuitable configuration is

2 Time Warp

Support for this work was provided in part by the Advanced Research Projects Agency under contracts J–FBI–93–116 and DABT63– 96–C–0055.

In a Time Warp synchronized discrete event simulation, Virtual Time [10] is used to model the passage of the time in the simulation. Changes in the state of the simulation occur as

1

Straggler Message

Input(s)

Antimessages

Input Queue

Output Queue

Output(s)

Process Control

Adaptive Element

PP State Queue

IP Identification

Figure 1. A Simulation Object in a Time Warp Simulation

Figure 2. Feedback Control

Upper Threshold

events are processed at specific virtual times. In turn, events may schedule other events at future virtual times. The virtual time defines a total order on the events of the system. The simulation state (and time) advances in discrete steps as each event is processed. The simulation is executed via several simulator processes, called simulation objects. Each simulation object is constructed from a physical process (PP) and three history queues. Figure 1 illustrates the structure of a simulation object. The input and the output queues store incoming and outgoing events respectively. The state queue stores the state history of the simulation object. Each simulation object maintains a clock that records its Local Virtual Time (LVT). Simulation objects interact with each other by exchanging time-stamped event messages. One departure from Jefferson’s original definition [10] of Time Warp is that simulation objects are placed into groups called “logical processes” (LPs). The simulation objects must be synchronized in order to maintain the causality of the simulation; although each simulation object processes local events in their (locally) correct time-order, events are not globally ordered. Fortunately, each event need only be ordered with respect to events that affect it (and, conversely, events it affects); only a partial order of the events is necessary for correct execution [11]. Synchronization protocols can be classified into conservative [19], and optimistic protocols [8, 10]. In a conservatively synchronized simulation, an event is executed only when the simulation object receives guarantees that all other simulation objects will not produce an event that will invalidate it; each simulation object must have reached an LVT time equal to the time of the event to be processed1 . Accordingly, total order in processing the events is enforced. Under optimistically synchronized protocols (e.g., the Time Warp model [10]), simulation objects execute their local simulation autonomously, without explicit synchronization. A causality error arises if a simulation object receives a message with a time-stamp earlier than its LVT value (a straggler message). In order to allow recovery, the state of the simulation object and the output events generated are saved in history queues as each event is processed. When a straggler message is

Dead Zone Lower Threshold

Figure 3. Thresholding for Cancellation detected, the erroneous computation must be undone — a rollback occurs. The rollback process consists of: (i) the state of the simulation object is restored to a state prior to the straggler message time-stamp, and (ii) erroneously sent output messages are canceled (by sending anti-messages to nullify the original message). The global progress time of the simulation, called Global Virtual Time (GVT), is defined as the lowest of the LVT values of the simulation objects [8, 13, 18]. Periodic GVT calculation is necessary to reclaim memory space used by history queues; history items with a time-stamp lower than GVT are no longer needed, and are deleted to make room for new history items.

3 Linear Feedback Control Systems In feedback control systems, the system behavior is changed in response to observed output values, in order to force the system to a desired state [1]. More precisely, a feedback control system samples output values and adjusts input values in order to meet some performance criteria (Figure 2). For example, in a Time Warp simulator, the output may be the number of events committed between GVT cycles, the number of events rolled back upon receipt of a straggler message, LVT, or any other information that can be observed about the system. Such values that have been used in several investigations for dynamically adjusting simulation parameters [2, 14, 17, 20, 21, 22, 28]. Virtually all dynamic control investigations have also used data filtering techniques to smooth and to prevent spurious data points from causing wide variations in parameter adjustment. In this research, we have found that non-linear thresholding functions are best suited for damping the selection of which cancellation strategy to use. A thresholding function defines boundaries on input values that determine the output value pro-

1 There is a model dependent tolerance for this condition; the more general condition is that the LVT of other simulation objects is within a lookahead value of the event time.

2

save state

duced. For example, Figure 3, demonstrates the filtration process used for the dynamic control of the cancellation strategies (discussed in more detail in Section 5). Two thresholds are used with a dead zone lying between the thresholds. In this example, the function changes its value only after it moves into the shaded region above or below each threshold. When the input falls in the dead zone, the function continues to produce the same output. Unlike analog control systems, the feedback control logic competes for the same CPU cycles that are used for useful computation. Thus, control should not be adapted at a high frequency, or the overhead for tuning the simulator will outweigh the benefits from the better configuration [26]. A configuration control system, is characterized by the following tuple: < O; I; S; T; P >, where O is the sampled output, I is the current state of the parameter under configuration, S is the initial configuration, T is a transfer function from O to I 0 , where I 0 is the new configuration, and P is the configuration period. This model will be further elaborated in the remainder of this paper, where the performance tuning of three independent aspects of Time Warp simulators using on-line configuration is discussed.

straggler message

t

coast forward length

rollback length

Figure 4. Rollback and Coasting Forward ( = 4)

There have been a variety of proposals to dynamically controlling the checkpoint interval [7, 14, 21, 29]. These techniques employ adaptive control to dynamically establish values for the checkpoint interval. Some of the performance parameters monitored in these studies include the average time to execute one event, the average time to save the process state, and the number of rollbacks. All of the studies sample the output values for each individual process, and recalculate the optimal checkpoint interval infrequently. Both Lin [14] and Palaniswamy [21] derive an analytical model of a Time Warp simulator and use the model to derive a formula for dynamically establishing values for the checkpoint interval. Unfortunately, each approach results in a fairly complex formula that requires significant processing overhead for evaluation and data capture. Consequently, we developed a feedback control system for dynamically adapting the checkpointing interval [7]. This control system may be described using the tuple < Ec ; ; ; A; P >. More precisely, the control system monitors a cost index, Ec , for check-pointing that is the sum of the state saving costs and the coasting forward costs in the interval starting at the previous invocation. is the checkpoint interval, the parameter under configuration. The control system operates under the assumption that the optimal period results in the minimum value for Ec . Simulation starts with an initial period of . The transfer function, A, is the following: at every control invocation, if Ec is not observed to have increased significantly, the check-pointing period is incremented; otherwise, it is decremented. P is the period for the invocation of the control process. The control system converges to the optimal period, assuming the single minima assumption holds.

4 Check-pointing Strategies Check-pointing is necessary to protect against erroneous optimistic computation. Each Logical Process (LP) must periodically save its local state such that, in the event of a causality error, a rollback to a correct state is possible. Time Warp objects with large states require considerable memory space as well as CPU cycles for state saving. In general, states are saved after every event execution. However, the checkpointing cost can be reduced by saving the state infrequently. In the simple case, a Time Warp simulator checkpoints every events; this scheme is called Periodic Check-pointing. Figure 4 illustrates a process rolling back upon the receipt of a straggler message (white circle). The shaded circles denote events and the filled boxes denote state saving points. Since check-pointing does not occur after every processed event, a rollback will have to re-execute any intermediate events (coast forward). Thus, an optimal checkpoint interval must balance the cost of check-pointing vs. the coast of the coast-forward phase. Excessively large check-pointing periods will have long coast-forward phases but low check-pointing cost. The opposite is true for excessively small check-pointing periods. Using periodic state saving, the arrival of a straggler message may require the system to rollback to an earlier state than necessary and coast forward [8]. While coasting forward, no messages are sent out to the other processes in the system (the previously sent messages are correct). The difficulty of periodic state saving is determining an appropriate fixed frequency for check-pointing. Some applications operate best with a fairly small value; while others require much larger values [3, 25]. Currently, no practical techniques for statically analyzing simulation applications to decide the checkpoint frequency are known — motivating investigations into dynamically adjusting the checkpoint interval.

1

1

5 Cancellation Strategies The performance of the simulator is highly dependent on the efficiency of the cancellation strategy employed to undo the effects of erroneous computation (the sending of anti-messages). Two known cancellation strategies exist [8]: aggressive-cancellation, and lazy-cancellation. Under aggressive-cancellation, the arrival of a straggler message forces the immediate generation of an anti-message to cancel erroneous outputs. In contrast, lazy cancellation delays sending anti-messages until forward processing demonstrates, by comparison of old and new output, that the original output

3

messages were incorrect. Thus, there is a potential for reduction in communication, as well as wasted optimistic computation. The performance under lazy-cancellation is better than aggressive cancellation if the regenerated output messages do not differ widely from the prematurely sent messages. Studies have shown that lazy-cancellation can perform better than aggressive-cancellation, but that, even within the same application domain, some models perform better under aggressive cancellation [2, 28]. In our experiments using digital systems models [27] written in the hardware description language VHDL [24], we observed that:

Aggressive Hits + #Lazy hits Hit Ratio = #LazyFilter Depth Comparisons The cancellation strategy is determined using a thresholding function whose input is the Hit Ratio. The threshold to switch from aggressive to lazy-cancellation is called A2L Threshold and the threshold to switch from lazy to aggressive-cancellation is called L2A Threshold. These two thresholds can have the same value in which case the dead zone is eliminated. If the Hit Ratio is high ( e.g., 0.4) then it the LP favors lazy-cancellation. Conversely, if it is low ( e.g., 0.2) then the LP favors aggressive-cancellation. These thresholds are fixed at compile time (optimal values for them are currently determined empirically). One possible heuristic for switching between the algorithms is to monitor the HR. The simulation starts with aggressive-cancellation. Whenever HR rises over A2L Threshold, the simulation is switched to lazycancellation. If HR falls below L2A Threshold, the simulation is switched back to aggressive-cancellation. Thrashing between strategies (continuous switching among them) is minimized using: (i) a large filter depth; (ii) the infrequent invocation of the control mechanism; and (iii) the hysteresis introduced by the “dead-zone” between the two thresholds.

Neither aggressive nor lazy-cancellation is clearly superior to the other. The optimal strategy is sensitive to the partitioning scheme. Different LPs within the same application operate best under different cancellation strategies. Within a single LP, the optimal cancellation strategy varies over the lifetime of the simulation. The optimal cancellation strategy depends on the the application and the platform. Accordingly, a static selection of the cancellation strategy does not yield the optimal configuration. Most Time Warp simulators support the two cancellation strategies in the form of a compile-time (or simulation-time) switch — the selection of the cancellation strategy is the responsibility of the user. Lin’s analysis [12] demonstrates that even with a number of unrealistic assumptions, a static analysis to determine the optimal cancellation strategy is complex and requires perfect knowledge of the simulation. Thus, we propose that the selection can be performed dynamically by the simulator using on-line configuration. The control model for dynamically selecting the cancellation strategy can be described as < HR; I; Aggressive;A; P >. HR is the Hit Ratio, an index defined on the observed cancellation of the simulator. I is the selected cancellation strategy; I 2 fAggressive;Lazyg. The initial state of the system is Aggressive cancellation. A is the heuristic used in determining which strategy to use next, and P is the period between the invocation of the control process. The remainder of this section will describe HR and A in detail. As a result of our experiments, we have found that the optimal cancellation strategy in the simulation correlates well with a performance index we call the Hit Ratio (HR). HR is a measure of how productive an LP’s premature computations were in its recent past. Under aggressive-cancellation, an LP is said to have a Lazy Aggressive Hit if it generates the same message before and after a rollback, otherwise it is said to have a Lazy Aggressive Miss. Each LP maintains a dynamic record of the past n output message comparisons. The number of comparisons, n, is statically controlled (by the user) and is called the Filter Depth. In order to update HR under aggressivecancellation, the simulation must continue to monitor whether the sent messages have different values after the rollback. The overhead of this comparison is small. The time required to switch between the two strategies is negligible with respect to the rollback time. The Hit Ratio is then defined as follows:

6 Dynamic Message Aggregation Dynamic Message Aggregation (DyMA) is an optimization to the Time Warp communication system that matches the communication behavior to the underlying communication system. Using DyMA, the communication module of each LP collects application messages destined to the same LP, that occur in close temporal proximity, and sends them as a single physical message. Since there is a large overhead associated with each message (regardless of the message size), significant improvement in performance is possible. The decision on when to send the messages is made by the aggregation policy. Clearly, the higher the number of messages aggregated, the greater the reduction in the communication overhead. The longer the messages are delayed, the greater the number of messages aggregated. However, delaying messages excessively harms the performance of the receiving LP. Thus, aggregation policies must balance the potential gain from additional message aggregation, to the potential loss at the receiving LP. It is difficult to determine a static balance between the two factors; the communication behavior of Time Warp simulators is highly dynamic and unpredictable. While static window sizes (in time, or number of messages) for aggregation yield some performance improvement, better overall performance results from dynamic control of the aggregate size. In specifying the window size, we seek to balance the benefits resulting from aggregating more messages, versus the harm from delaying messages excessively. These two factors are modeled as: (i) Aggregation Optimistic Factor (AOF): AOF is the gain from delaying the messages; it is proportional to the rate of reception of messages to be sent. If AOF is high, a large number of messages are aggregated without excessive delay;

4

face can be constructed using either conservative [19] or optimistic [4, 8] parallel synchronization techniques. Furthermore, the simulation kernel can operate as a sequential kernel. The WARPED system is composed of a set of C++ class and template libraries which the user accesses in several ways. Where the kernel needs information about data structures within the application, they are passed into kernel template classes. When kernel data or functions need to be made available to the user, they can be accessed by one of two mechanisms: inheritance and method invocation. A more detailed description of the internal structure and organization of the WARPED kernel is available on the www at http://www. ececs.uc.edu/˜paw/warped. The results reported in this paper were obtained by executing two different simulation models on a network of SUN (SPARC 4 and 5) workstations interconnected by 10Mb Ethernet. To fully test the system, the network of workstations chosen for the experiments were not dedicated to the experiments. Five sets of measurements were taken at two different times and the average of these values were then reported. The two models (available in the WARPED distribution) are:

and (ii) Aggregation Pessimistic Factor (APF): APF models the harm from delaying the messages. It is proportional to the age of the aggregate. Note that both of these factors vary with the nature of the application, and may change dynamically within the lifetime of the simulation. Initially, a static policy that aggregates messages for a fixed time, called Fixed Aggregation Window (FAW), was developed. The age of the first message received by the aggregation layer is tracked. Once this age reaches a constant value (the size of the window), the aggregate message is sent. The advantage of this policy is its low overhead; only a single check of the current aggregate age (time that the aggregate has been alive) against the constant window size is required. This policy provides a static balance between AOF and APF, making it insensitive to changes in the communication behavior of the application. No matter how high (or low) the message arrival rate is, the fixed window size is used. Since the chosen window size significantly affects the performance of this policy, dynamic control of the balance between AOF and APF is desirable. A dynamic policy, called Simple Adaptive Aggregation Window (SAAW), is suggested. SAAW extends FAW to adapt the window size with the message arrival rate. The initial aggregation window is specified statically as in the case of FAW. During simulation, the message rate achieved by an aggregate is calculated when the aggregate is sent and used to decide what the aggregation window for the next aggregate should be. Changing the aggregation window size allows policy to adapt its behavior to vary with the behavior of the application. For example, if the application is exhibiting bursty communication behavior, the aggregation window size is increased to take advantage of the higher AOF. The overhead for implementing SAAW is slightly higher than FAW; there is an additional computation to determine the window size when the aggregate is sent. < R age ;W; Winitial ; SAAW; everyAggregate > describes the control system used by the SAAW strategy. R age is the rate of reception of messages, modified to reflect the age of the aggregate. Thus, an aggregate a1 that achieves a message reception rate r is considered to have a higher modified rate than an aggregate a2 with the same rate r if the age of a1 is smaller than that of a2 . W is the window size, which is the parameter being configured. Winitial is the initial window size. SAAW is the control heuristic; in our implementation, W is increased if R age has increased relative to the last aggregate, and vice versa. Finally, the window size is adapted as each aggregate is sent.

( )

SMMP: The SMMP application models a shared memory multiprocessor. Each processor is assumed to have a local cache with access to a common global memory (The model is somewhat contrived in that requests to the memory are not serialized — i.e., main memory can have multiple requests pending at any given moment). The model is generated by a C++ program which lets the user adjust the following parameters: (i) the number of processors/caches to simulate, (ii) the number of LPs to generate, (iii) the speed of cache, (iv) the speed of main memory, and (v) the cache hit ratio. The generation program partitions the model to take advantage of the fast intra-LP communication. The model configuration used in the experiments reported herein is 16 processors simulated using 4 LPs. The cache speed is set to 10 nanoseconds and the main memory for 100 nanoseconds. The cache hit ratio is set as 90%. This application has 100 simulation objects. Each processor generates a user specified number of memory requests. Each request (also referred to as a test vector) is in fact a token that contains information about its creation time, the creator (simulation) processor, and the time at which this request should be satisfied.

( )

( )

RAID: The RAID application models the RAID Disk Arrays which is a method of providing a vast array of storage [5] with a higher I/O performance than several large expensive disks. This application incorporates a flexible model of a RAID Disk Array and can be configured in various sizes of disk arrays and request generators. Each request is in fact a token that carries information about the number of disks, the number of cylinders, number of tracks, number of sectors, size of each sector and specific information about which stripe to read and parity information. The following configuration is used for the data reported in this paper: 20 source processes generate 1000 requests each to 8 disks via 4 forks. The application is partitioned into 4 LPs.

7 The Experimental Environment The WARPED simulation kernel provides the functionality to develop applications modeled as discrete event simulations [15, 16]. Considerable effort has been made to define a standard programming interface to hide the details of Time Warp from the Application Programming Interface (API). All Time Warp specific activities such as state saving and rollback are performed by the kernel without intervention from the application. Consequently, an implementation of the WARPED inter-

5

RAID 20 processes, 4 forks, 8 disks, 4 LPs 2000.0 AC LC DC ST0.4 PS32 PA10

Periodic Checkpointing & Aggressive Cancellation

1.5 Periodic Checkpointing & Lazy Cancellation Dynamic Checkpointing & Lazy Cancellation

1500.0

1.3

Execution Time (secs)

Normalized Performance

1.4

1.2

1.1

1000.0

1.0

500.0 0.9

0.8 RAID

SMMP

0.0

500

1000 Number of Requests

Figure 5. Dynamic Check-pointing (normalized results)

Figure 6. Execution Time vs. Number of requests

lazy-cancellation provides better performance than aggressivecancellation. Dynamic-cancellation performs better than lazycancellation due to the fact that all objects strictly favor either aggressive or lazy-cancellation. DC and ST0.4 perform 1.5% faster than lazy-cancellation while PS32 and PA10 provide a 2.5% speedup (because the cost of doing passive comparison is completely avoided by the objects which favor aggressivecancellation). Figure 7 shows the execution time as a function of the number of requests for SMMP for the following cancellation strategies:

8 Analysis The performance of the check-pointing algorithms is shown in Figure 5. The SMMP model processed 11,300 committed events per second when no dynamic optimizations were used; RAID processed 10,917 committed events per second. The all-static optimizations case was used as the base value against which the results are normalized (i.e., 1.0 in the graph represents 11,300 committed events per second for SMMP when periodic check-pointing and aggressive cancellation is used). Dynamic check-pointing improved the performance of the simulation by 30% in the best case. The performance of dynamic-cancellation relative to aggressive and lazy-cancellation for both applications was studied. Figure 6 is a plot of execution time as a function of the number of requests for RAID for the following cancellation strategies:

AC, LC, and DC remain the same as before PS : Dynamic-cancellation with the cancellation strategy permanently set after 64 (Filter Depth) comparisons PA : Dynamic-cancellation with the cancellation strategy permanently set to aggressive if 10 successive comparisons result in misses In this application, all the objects strictly favor lazycancellation. This results in a 15% speedup over aggressive cancellation. Consequently all the variations of dynamiccancellation perform on par with lazy-cancellation. PS64 performs slightly better than DC and PA because it permanently switches into lazy cancellation after 64 comparisons and does not have to monitor the hit (misses) throughout the simulation. Figures 8 and 9 show the performance of the FAW and SAAW policies for different aggregate ages on a network of SUN SPARC workstations. Clearly, aggregation yields considerable speedup (30% in the best case) on a network of workstations. There appears to an “optimal” window size for which the aggregation performance is best for each application. Window sizes less than that are too conservative; additional aggregation is possible without hurting performance. Conversely, window sizes greater than the optimal value delay the messages excessively, nullifying the benefit obtained from the additional aggregation. The SAAW strategy is superior to FAW

AC : Aggressive-cancellation LC : Lazy-cancellation DC : Dynamic-cancellation with Filter Depth = 16, A2L Threshold = 0.45 and L2A Threshold = 0.2 ST0.4 : Dynamic-cancellation with a single threshold, A2L Threshold = L2A Threshold = 0.4 PS32 : Dynamic-cancellation with the cancellation strategy permanently set after 32(Filter Depth) comparisons PA10 : Dynamic-cancellation with the cancellation strategy permanently set to aggressive if 10 successive comparisons result in misses In this application, all disk objects favor lazy-cancellation while all the fork objects favor aggressive-cancellation. In the model configuration chosen for this investigation, there are more disk objects than fork objects. Consequently,

6

SMMP 16 Processors, 4 LPs 500.0 AC LC DC PS PA

400.0

Aggregate Age vs Execution Time for RAID (NOW) 500 with FAW with SAAW Unaggregated Version

450

Execution Time (seconds)

Execution Time (secs)

400

300.0

350 300 250 200 150

200.0 100 50 0 1

100.0

10

100

1000

Aggregate Age

Figure 9. DyMA results for RAID 0.0

2000

5000 No of Test Vectors

10000

because: (i) it is difficult to determine an optimal configuration statically; (ii) the optimal configuration may vary during the simulation time; and (iii) the optimal configuration may be different across LPs in the same simulation. This paper demonstrates that the performance of Time Warp simulators will benefit from dynamic control of its configuration to match the current behavior of the simulator. It presents a linear-control model for on-line configuration of the simulation. The model is different from traditional control theory because data-sampling and parameter adjustment are intrusive; these operations contend for processor-cycles that can be used for useful computation. In addition, this control process is, by necessity, imprecise. Consider, that: (i) changes of the local LP configurations will have secondary effects (on other LPs and the system), and tertiary effects that are reflected back to the LP; these effects are difficult to quantify; and (ii) because data-sampling and parameter adjustment are intrusive, it is usually not feasible to implement the most accurate control system if it is too complex. This was illustrated in the periodic saving control system, where our simple heuristic outperformed the more rigorous (but more costly) techniques. The on-line configuration model was used to adapt the behavior of different facets of the simulation. In particular, control strategies for dynamic adjustment of (i) checkpoint interval (ii) cancellation strategy (lazy or aggressive), and (iii) dynamic aggregation of application messages were presented. The public domain WARPED Time Warp simulation kernel was modified to implement these optimizations. Performance results for the control strategies demonstrate that adaptive schemes can improve performance. Because of the imprecise nature of the control systems, it is nearly impossible to find the optimal settings for them. We relied on empirical data to guide the design of the control systems. We believe that further analysis of the parameters of the control model will allow better control systems to be constructed and yield further improvement in performance.

Figure 7. Execution Time vs. Number of requests Aggregate Age vs Execution Time for SMMP (NOW) 450 with FAW with SAAW Unaggregated Version

400

Execution Time (seconds)

350 300 250 200 150 100 50 0 1

10

100

1000

Aggregate Age

Figure 8. DyMA results for SMMP because it is able to converge on the optimal window size dynamically. Note here that a window size is statically fixed for the FAW strategy whereas the statically fixed window size for the SAAW strategy is only the initial window size. We expect that with more sophisticated adaption of the window size, additional performance improvement can be obtained.

9 Concluding Remarks The performance of Time Warp depends on the nature of the application being simulated. The choice of the subalgorithms to implement several of the simulator functions, as well as the parameter settings for these sub-algorithms significantly affect performance; the settings of these sub-algorithms and parameters are collectively called the configuration of the simulator. Most simulators operate with a static configuration that is either set by default or determined by the user at compile or simulation time. However, a static configuration is limited

References [1] A STROM , K. J., AND W ITTENMARK , B. Adaptive Con-

7

1996), H. El-Rewini and B. D. Shriver, Eds., vol. Volume I, pp. 383–386.

trol. Addison Wesley, Reading, MA, 1989. [2] BALL , D., AND H OYT, S. The adaptive time-warp concurrency control algorithm. In Distributed Simulation (Jan. 1990), Society for Computer Simulation, pp. 174– 177.

[17] M ATSUMOTO , Y., AND TAKI , K. Adaptive time-ceiling for efficient parallel discrete event simulation. In ObjectOriented Simulation Conference (OOS ’93) (Jan. 1993), T. Beaumariage and C. Roberts, Eds., Society for Computer Simulation, pp. 101–106.

[3] B ELLENOT, S. State skipping performance with the time warp operating system. In 6th Workshop on Parallel and Distributed Simulation (Jan. 1992), Society for Computer Simulation, pp. 53–61.

[18] M ATTERN , F. Efficient algorithms for distributed snapshots and global virtual time approximation. Journal of Parallel and Distributed Computing 18, 4 (Aug. 1993), 423–434.

[4] C HANDY, K. M., AND S HERMAN , R. Space-time and simulation. In Distributed Simulation (1989), Society for Computer Simulation, pp. 53–57.

[19] M ISRA , J. Distributed discrete-event simulation. Computing Surveys 18, 1 (Mar. 1986), 39–65.

[5] C HEN , P. M. Raid: High-performance, reliable secondary storage. ACM Computing Surveys 26, 2 (June 1994), 145.

[20] PALANISWAMY, A., AND W ILSEY, P. A. Adaptive bounded time windows in an optimistically synchronized simulator. In Third Great Lakes Symposium on VLSI (1993), pp. 114–118.

[6] F ERSCHA , A. Probabilistic adaptive direct optimism control in time warp. In Proc. of the 9th Workshop on Parallel and Distributed Simulation (PADS 95) (June 1995), pp. 120–129.

[21] PALANISWAMY, A., AND W ILSEY, P. A. Adaptive checkpoint intervals in an optimistically synchronized parallel digital system simulator. In VLSI 93 (Sept. 1993), pp. 353–362.

[7] F LEISCHMANN , J., AND W ILSEY, P. A. Comparative analysis of periodic state saving techniques in time warp simulators. In Proc. of the 9th Workshop on Parallel and Distributed Simulation (PADS 95) (June 1995), pp. 50– 58.

[22] PALANISWAMY, A., AND W ILSEY, P. A. Scheduling time warp processes using adaptive control techniques. In Proceedings of the 1994 Winter Simulation Conference (Dec. 1994), J. D. Tew, S. Manivannan, D. A. Sadowski, and A. F. Seila, Eds., pp. 731–738.

[8] F UJIMOTO , R. Parallel discrete event simulation. Communications of the ACM 33, 10 (Oct. 1990), 30–53. [9] F UJIMOTO , R. M. Time warp on a shared memory multiprocessor. Transactions of Society for Computer Simulation (July 1989), 211–239.

[23] PALANISWAMY, A., AND W ILSEY, P. A. Parameterized time warp: An integrated adaptive solution to optimistic pdes. Journal of Parallel and Distributed Computing 37, 2 (Sept. 1996), 134–145.

[10] J EFFERSON , D. Virtual time. ACM Transactions on Programming Languages and Systems 7, 3 (July 1985), 405– 425.

[24] P ERRY, D. L. VHDL, 2nd ed. McGraw–Hill, New York, NY, 1994.

[11] L AMPORT, L. Time, clocks, and the ordering of events in a distributed system. Communications of ACM (July 1978), 558–565.

[25] P REISS , B. R., M AC I NTRYE , I. D., AND L OUCKS , W. M. On the trade-off between time and space in optimistic parallel discrete-event simulation. In 6th Workshop on Parallel and Distributed Simulation (Jan. 1992), Society for Computer Simulation, pp. 33–42.

[12] L IN , Y. Estimating the likelihood of success of lazy cancellation in time warp simulations. International Journal in Computer Simulation (1996). [13] L IN , Y.-B. Memory management algorithms for optimistic parallel simulation. In 6th Workshop on Parallel and Distributed Simulation (Jan. 1992), Society for Computer Simulation, pp. 43–52.

[26] R ADHAKRISHNAN, R., M OORE , L., AND W ILSEY, P. A. External adjustment of runtime parameters in time warp synchronized parallel simulators. In 11th International Parallel Processing Symposium, (IPPS ’97) (Apr. 1997), IEEE Computer Society Press, pp. 260–266.

[14] L IN , Y.-B., P REISS , B. R., L OUCKS , W. M., AND L A ZOWSKA , E. D. Selecting the checkpoint interval in time warp simulation. In Proc of the 7th Workshop on Parallel and Distributed Simulation (PADS) (July 1993), Society for Computer Simulation, pp. 3–10.

[27] R AJAN , R., R ADHAKRISHNAN, R., AND W ILSEY, P. A. Dynamic cancellation: Selecting time warp cancellation strategies at runtime. International Journal in Computer Simulation (1997). (forthcoming). [28] R EIHER , P. L., W IELAND , F., AND J EFFERSON , D. R. Limitation of optimism in the time warp operating system. In Winter Simulation Conference (Dec. 1989), Society for Computer Simulation, pp. 765–770.

[15] M ARTIN , D. E., M C B RAYER , T., AND W ILSEY, P. A. WARPED: A time warp simulation kernel for analysis and application development, 1995. (available on the www at http://www.ece.uc.edu/˜paw/warped/).

¨ [29] R ONNGREN , R., AND AYANI , R. Adaptive checkpointing in time warp. In Proc. of the 8th Workshop on Parallel and Distributed Simulation (PADS 94) (July 1994), Society for Computer Simulation, pp. 110–117.

[16] M ARTIN , D. E., M C B RAYER , T. J., AND W ILSEY, P. A. WARPED: A time warp simulation kernel for analysis and application development. In 29th Hawaii International Conference on System Sciences (HICSS-29) (Jan.

8

On-line Configuration of a Time Warp Parallel Discrete ... - CiteSeerX

On-line Configuration of a Time Warp Parallel Discrete ... - CiteSeerX

Suggest Documents

A Comparative Analysis of Various Time Warp Algorithms - CiteSeerX

Optimizing Communication in Time-Warp Simulators - CiteSeerX

Probabilistic Checkpointing in Time Warp Parallel ... - Semantic Scholar

A Comparative Analysis of Various Time Warp Algorithms ... - CiteSeerX

Optimizing Communication in Time-Warp Simulators - CiteSeerX

A Network Version of the Time Warp Operating System ... - CiteSeerX

Time Warp Simulation on Clumps - CiteSeerX

Extensions to Time Warp Parallel Simulation for ... - Semantic Scholar

Time Warp Edit Distance

A Distributed Fossil-Collection Algorithm for Time-Warp? - CiteSeerX

GTW: A TIME WARP SYSTEM FOR SHARED MEMORY ... - CiteSeerX

warped: A Time Warp Simulation Kernel for Analysis and ... - CiteSeerX

time warp on a transputer platform: pilot study with ... - CiteSeerX

Toward Grid-Aware Time Warp

A Parallel Discrete-Event Simulation of Wafer Fabrication ... - CiteSeerX

discrete size and discrete-continuous configuration optimization ...

Sam Samurai - Time Warp Trio

The Discrete Time ToolBus - CiteSeerX

Toward Grid-Aware Time Warp

Synchronisation Primitives for Highly Parallel Discrete ... - CiteSeerX

External Adjustment of Runtime Parameters in Time Warp ... - CiteSeerX

Configuration and Management of a Real-Time Smart ... - CiteSeerX

Configuration and Management of a Real-Time Smart ... - CiteSeerX

IMPULSIVE CONTROL OF DISCRETE-TIME ... - CiteSeerX