Discrete event simulation on Cray T3E: A progress ... - Semantic Scholar

Discrete event simulation on Cray T3E: A progress report of some exploratory work on urban transportation networks Behrouz Zarei Management Science Department, Lancaster University, Lancaster LA1 4YW, UK Email: [email protected]

Abstract This paper introduces discrete event simulation as a tool for analysis of stochastic systems. It argues that simulation of such systems may require computations beyond the power of single processor computers, and suggests that running the simulation on a multiple processor platform might resolve this problem. There are a number of alternative ways of carrying out parallel simulations, each of which are described. This is followed by a discussion of the state of the art in this field. To justify parallel simulations, an urban transportation network analysis and design is presented. It is necessary to employ a parallel computer for the following four reasons: the problem is complex, it takes a long time for the simulation executions to reach a steady state, each simulation run is too much time consuming, and there is also the issue of state dependency. Performance results of different parallel simulation scenarios on 22, 23, and 44-proc Cray T3E 1200E are reported.

1. Introduction Investigation of the behaviour of city infra-structural networks such as natural gas pipes, power cables and city streets in different scenario is clearly of great interest to urban planners, but a considerable challenge for modellers. The aim of such investigations is to pinpoint any shortcomings of these networks and to come up with some much needed improvements in order to make them more reliable in any situation. Such a study can be approached from different angles - from an engineering point of view, where there is more interest in the details of the network, to a managerial one, where there are more general concerns. In this paper we concentrate on a parallel simulation of a city street network on Cray T3E. By city network we mean all routes of the city along which vehicles can pass [other routes are ignored]. The model is not trivial since it includes several hundred nodes and arcs, the simulation of which takes a long time and the nature of the problem also means that all simulation models require long steady states. Section 2 presents the discrete-event simulation paradigm. Section 3 discusses how such a simulation can be run on a parallel platform and how in a distributed model components, communication is achieved between different processors. Then an urban transportation network is described, followed by an analysis of the problem via parallel simulation. Parallel aspects of this model are presented in section 5. The paper dedicated to performance issues of the parallel simulation execution using different number of processors, and deriving guidelines for improvement to the transportation network are not discussed.

2. Discrete-event simulation approach Computer simulation becomes a legitimate research tool when known analytical methods cannot supply a solution to the problem. One approach to computer simulation is to treat the system under study as a sequence of discrete events1 which take place at selected instants. These are referred to as states. States represent current values of the simulation variables and their transfer from one instant to another represents the evolution of the system over time, and the agent that performs this transformation is referred to as an event. It is assumed that the state of the system remains unchanged between two states. This evolution continues until a condition emerges indicating the termination of the simulation. Events have common attributes, one of which is a timestamp representing the point in

107

time when the event in the actual system will occur. During the simulation run when the timestamp of an event is fixed, it is scheduled in an ascending list known as an event list. This list facilitates the management of events in the simulation program. Simulation programs also maintain another time element, the so-called simulation clock represents the time of the last executed event in the program. When an event is executed, it is removed from the event list, the simulation clock changes to the timestamp of this executed event, and the simulation variables are updated. This continues until either there are no more events to execute or the required length of the simulation has elapsed. This approach has been successfully adopted to cover a wide range of applications, including flexible manufacturing systems, computer networks, and health care. It argued that simulation could enable us to better understand and analyse complex systems such as telecommunication networks, transportation systems, and military operations where analytical models have proved to be inefficient. Despite this fact, simulation of complex systems requires special considerations including the long execution time of the simulation program. The simulation execution time for these systems may take such a long time that it is quite simply beyond the capacity of even very fast single processor computers. In some other cases in order to perform more experiments it is desirable to run the simulation as quickly as possible. In other cases we need to put the simulation into an iterative technique to investigate the system under different set of parameters rather than just using the simulation for “what-if” questions. Running such a model on multiple processors appears to be a viable solution to this problem.

3. Parallel discrete-event simulation Parallelisation of the simulation program is studied using distributed memory, shared-memory, and SIMD (single-instruction-stream, multiple-data-stream) platforms. Distributed memory exploits the inherent parallelism of the simulation and so it has the greatest potential for parallel simulation, but at the same time developing the program is a greater challenge than the other alternatives mentioned above. To successfully implement this approach, we need to effectively decompose and allocate the simulation. There are five fundamental approaches to simulation decomposition, each of them with their own strengths and weaknesses. 1. Parallelising compilers. This method uses a parallelising compiler to find sequences of code in a sequential simulation that can be processed in parallel and on separate processors on a multiprocessor2. In practice, it is often difficult to rely on compilers to partition and distribute the source code without the programmer encoding explicit directives. In this case significant speedup will not occur unless most of the program can be parallelised. If significant parts of the computation must be executed sequentially, then a fast uniprocessor will often outperform a multiprocessor. 2. Distributed experiments. With this method independent replications of a serial simulation must be run on N processors3. One could expect to see a virtual acceleration of N with N processors if the lengths of the experiments are approximately equal. This approach allows simultaneous simulations, each with different parameters. 3. Distributed language functions. This approach involves the assignment of simulation support tasks (e.g. graphic generation) to various processors4. This is a very clean approach from a design standpoint. But, in some cases, the synchronisation costs of tightly coupled functions may eliminate any overall gain. This method has the advantage of avoiding the deadlock problem and is transparent to the user, but it cannot exploit any inherent parallelism in the system being modelled. 4. Distributed events. It is possible to distribute the scheduling of events from a global event list. To do this, special protocols are required because currently processed events may affect the next event. Therefore, event dependencies must be known prior to scheduling. This approach is more appropriate for shared memory systems5. In this method a master processor maintains the global event list. Each processor is required to maintain this list for the next event to be executed. This method is suitable for a small number of processors, or when the components of the system require a large amount of global information. The disadvantage of this method is that the failure of the master processor can lead to unrecoverable failure.

108

5.

Distributed model components. In this method, the simulation model is decomposed into loosely coupled components. This approach exploits the inherent parallelism in the model but requires careful synchronisation. Message-passing usually controls this synchronisation. This approach has the greatest potential in terms of exploiting inherent parallelism in the system. This is the approach to parallelisation adopted in this paper. The following diagram depicts how this method works to simulate a single simulation program on three processors.

Model 1 t1=12:52

2.10 Event list State of this sub-model

Local clock

Model 2 t1=2:30

2:41

Model 3 t1=3:0

2:54

Event list

4:02

4:51

Event list

State of this sub-model

State of this sub-model

Local clock

Local clock

Figure 1: Mapping the three components of a simulation model to three processors In Figure (1) the model is divided into three components, Model 1, Model 2, and Model 3. Each processor runs a component, maintaining its own behaviour, clocks and event lists, and exchanging messages to handle their interfaces. One problem with running the simulation models on a distributed memory platform (termed causality error) is that the events may be executed in a wrong sequence. This is due to the model decomposing into loosely coupled components. In sequential simulation, to avoid this problem, events are scheduled in a timestamp ordered event list. Executing the first events in the list ensures that the events are processed in the right timestamp. When components and events of the model are distributed across different processors, it is still necessary to process the events exactly the same as the sequential ones. This demands a careful synchronisation through the use of interchanging messages. This is different from normal applications in the parallel computing community where the models are typically run independently until some pre-specified condition arises and then the processor must be synchronised. To synchronise processors in parallel simulation, four groups of protocol - conservative, optimistic, hybrid, and adaptive - are deployed. In a conservative mechanism, if a process contains an unprocessed event E1 and the processor can determine that it is impossible to receive another event with a timestamp less than E1, then the processor can safely process E1. Despite the straightforward nature of this protocol, it can lead to a deadlock. In order to avoid this and achieve a good performance, it is vital to exploit an attribute of the model known as lookahead. This is a quantity that allows different processors to process their events independently. The implementation of conservative mechanism starts with the deadlock avoidance algorithm6, and later on in order to reduce the communication drawbacks of this approach, a range of algorithms from deadlock detection and recovery7, conservative time window8, send message by demand9 to synchronous execution10 are proposed. The optimistic approach is based on the idea that each processor independently processes their events regardless of any possible causality errors. However, it is likely to receive messages, called stragglers, which violate the local causality of a processor. In this case the simulation has to roll back to a safe moment in the simulation time which is not affected by the straggler, and all wrongly propagated messages through the processors have to be cancelled. Therefore it is necessary to save the states of the simulation for possible referencing. This is performed via two well-known techniques. The first involves copying state saving in which a copy of all of the modifiable state variables within the processor is made before processing an event. The second involves incremental

109

state saving where we keep only those variables which have been changed as a result of event processing. A mechanism termed anti-messages also has to be employed to correct the wrongly sent messages. In fact as a result of a straggler, state variables of the processor are retrieved to a time just before the straggler time either by finding the latest unaffected state or by returning through the chain of changes which occurred in the states of the system. Correcting wrongly sent messages are straightforward. Upon sending a message to another processor, a copy of the message with a different tag, i.e. anti-messages, is saved in the sender. When a straggler arrives to a processor, anti-messages of all wrongly sent messages are forwarded to their receivers. In the receiver, if the original message still is not processed, it is matched and then annihilated. If it is already processed, the anti-message is treated as a straggler and if the original messages still do not arrive at the receiver, the anti-message will wait for the original counterparts and then both are annihilated. As explained earlier, before executing an event, the state of the system must be saved. The number of events in a medium size simulation may well be large, which means that a huge amount of memory is required in order to save the states of the system. In order to reduce the memory usage, it is recommended to reclaim the memory for these states (so-called fossil states) that the simulation can never roll back to their times. To explore these states, the idea of Global Virtual Time (GVT) is introduced. Fujimoto11 defined GVT as follows: Global Virtual Time at wall clock T (GVTT) during the execution of a Time Warp simulation is defined as the minimum time stamp among all unprocessed and partially processed messages and anti-messages in the system at wall clock time T. It will be safe to discard states with a timestamp smaller than GVT and their memory can then be reclaimed to the system. There are numerous algorithms to improve the performance of the TimeWarp, including the lazy cancellation algorithm12, the lazy re-evaluation13, and the Moving Time Window14,15. In practice, it seems difficult to prefer one of the above approaches to another. This raises the question of whether hybrid algorithms with some conservative and some optimistic characteristics are more preferable. These protocols are intended to take advantage of some of the optimistic properties such as less reliance on model-specific information and some of the conservative properties like less memory usage. Also it is desirable that a synchronisation protocol adopt itself to a change in the model characteristics to optimise performance. This leads to a group of hybrid protocols that can automatically adjust themselves in the conservatism optimism continuum, i.e. adaptive protocols16. Ideally these protocols monitor the parallel simulation and estimate the trade-off between a conservative cost (a blocking cost) and an optimistic cost (state saving and rollback) and adjust the protocol control parameters accordingly.

4. Problem setting: an urban transportation network simulation The basic objective of any urban transport network is to provide an infrastructure for a good service to the public. This network must facilitate moving between different places of a city; and for those cities experiencing natural hazards, such as earthquakes, special considerations should be taken into account. Estimating the city traffic pattern and efficiency of the network during the emergency period of the earthquake for these cities is challenging. Therefore in transportation system research to evaluate a city transportation network with reference to traffic, an abstract model of the city street network is needed. Such a model in this paper is represented by a network of size (N, A), where N is the set of nodes and A is the set of arcs joining the nodes. Nodes are land uses (or a group of land uses), crossroads or junctions, and the arcs are the streets. A trip is the distance between two nodes across a path of arcs. It is assumed that each trip starts from and ends at a land use such as housing, retail stores, gymnasiums, schools, religious facilities or governmental offices. Since it is not possible to consider all land uses individually and to calculate the distance between them, it is necessary to combine a group of land uses together and to

110

consider them as one end of a trip, represented by a node. For instance, in modelling residential buildings, houses or apartment buildings located in one quarter or neighbourhood are considered as one unit and represented by one node in the city network. Similarly, all shops and shopping centres in one block or street are also combined together as one shopping unit and shown by a node. Certain governmental or business offices in one area are put together also as one node. Clearly, there are also a number of single land uses such as a fire station or a hospital, which are represented by a node because of their importance or uniqueness. As a result of this simplification, the existing network of the city appears as in Figure 2(a).

1 2

3

4

5

10

9 7

6

12

8

11

15

14

13

16

22

21

20

19

25

24

Figure 2(b): A selecti on of the network

1 2

3

4

5

10

9

32 13

16

23

18

27

31

40 42

55 29 60

56 80

57

81 79

78

86

85

89

87

88

84

76

98

147

139

143

140

131

135

131

154

158

157

155

161

159 192

191 189

188

193

230 213

215

218

233

234

236

235

240

238

220

221

222

280

288

284

285

265 268 266

287

229

279

321

319

330

326

303

311 312

342 338

316 363

332

340 346

333

337

344

345 331

315

339 299

314

313

304

298

328

325

309 310

305 297

294

329

323

308 306

293

324 322

320

307 300 301

295

292 289

327

271 302

290

296

286

334

264

267

276 277

291

226

228

270

278

247

283

224

227

275

263

269

273

248

243

287

225 223

262

272

251

239

258

256 261

260

246

241

239 281

219

255

254

252

245

242

198

257

202

205

253

232

182

199

201

212

216 217

181

200 203

250

234 231

197 196

204

206

247

244

180

178

195

171

207

208 209

211

214

167

166 194

175 174

173

172

179 177

176

123

170

169

168

117

108

108 121

126

127

130

116 115

108

122

165

114

113

109

107

118

124

190 210

186

163 164

155

187

162

106

102

125

132

134

152

73

103

129

128 151

53

105

136

141

150

160

184

133

111

46

72 74

102 101

177

52

93

147 149

70

69

51

71

75

103

145

148

68

92

112

110 45

47 49

64

91

96 144

146

185

48

67

95

41 44

59

63

97

94

62

66

99

95

43

50 58

65

85

153

61

77

82

39

28 39

64

37

38

25

24 26

35

36

22

21

20

19

17

34

11

15

14

33

12

8

7

6

317

347 349 348

350

336 341

351 353

318

335

Figure 2(a): City street network

A criterion for evaluation this network is accessibility, which represents the extent of which the network can provide access for demanded trips in a specific time period. In order to maintain an accessible network, two factors have to be considered: travel time and safety. Travel time is amongst the most important factors for choosing a route for daily trips. In earthquake situations it is particularly important for the rapid transport of casualties to hospitals, for the setting up of other temporary medical care, and for fast access of other emergency services such

111

as the fire brigade and the police to the damaged parts of the city. To estimate the travel time on the arcs of the network, the length and speed of the arcs are considered. In this case, according to the Road Safety Office, there are three types of arcs, representing streets with different speed limits, either 30, 60 or 100km/h. In a survey we carried out- in order to combine these two factor- we found that the most common and convenient way to compute travel time was to multiply the length of the street by the reverse of speed. Safety is also important particularly in earthquake situations where unsafe routes are probably obstructed or intentionally avoided since they are likely to be dangerous. To estimate safety in the arcs, many factors such as electricity wires, water pipes, gas pipes, possible faults, quality of construction and building height are considered. Combining these factors leads to a safety number for arcs of the network. It is desired to understand the level of network’s accessible before, during and after an earthquake. One approach to investigate this preparedness is to generate possible trips before, during and after an earthquake and study the behaviour of the network. This was performed using information collected via a questionnaire distributed all around the city. Also to estimate the importance of the different trips using AHP (analytical hierarchical process) various weights were assigned to different types of trips. Then a simulation was run to represent the traffic behaviour through different time periods. Obviously the length of simulation had to be long enough to represent different time periods of this normal condition such as days and nights, weekend and working days etc. Also to imitate the traffic behaviour during and after the earthquake, different phases including temporary settlements and recovery etc. had to be simulated. Therefore, it took a very long time to reach a steady state in the simulation; and long simulation runs were required to undertake a statistical analysis of the result. This was clearly beyond the power of the sequential computers. The results of this analysis enabled us to find out which streets play an important role in providing fast access, safe access or both in different scenarios. It also may help the decision-makers to get insights into possible modifications needed to keep the network prepared for an earthquake. Such an investigation raises the following questions. What changes are needed in the existing network to achieve a better performance of the street network during the earthquake crisis? What paths are the best ones for emergency trips after an earthquake? What streets should be cleared first, if they are blocked after an earthquake? What are the trip paths, street utilisation and trip priorities? Responding these questions are helpful in estimating and analysing the traffic loads in the city. An algorithm17 based on optimisation techniques is suggested for this problem. The algorithm requires solving just fewer than 44 thousand linear programming models to represent the trips, each with 758 variables, representing the arcs, and 353 constraints, indicating the nodes of the network. A simplistic assumption like fixing a trip path at the start node of the trip degrades the applicability of this algorithm. The algorithm also ignores the stochastic elements of the system since it was difficult to be tackled by mathematical programming techniques. Therefore, the simulation approach was adopted to address the dynamic behaviour of the model along with its stochastic elements.

5. Parallel simulation of the transport network There are four widely used approaches to modelling for discrete simulation: the event approach, the activity approach, the process interaction approach, and the three-phase approach. All four methods have control program (so-called executive) in common, which is responsible for sequencing the operations which will occur as the simulation proceeds. These executives perform three tasks including time scan, event scan, and event execution. In all cases, before executing the events in the list, conditions must be tested. In the event approach if these conditions are met, the event will be executed instantly; in the three-phase approach if the condition is verified the state updates and B activities are scheduled for further processing; and in the process approach it is examined to see whether the entity is blocked, delayed or is allowed to continue its journey through the process. In all these approaches the time required by the processor for this operations are negligible.

112

The simulation approach employed for the transport network is slightly different in the sense that when an event is executed, scheduling new events in the event list solely as a result of this execution is not possible. Rather, it requires solving an optimisation problem before scheduling the next event. In this paper this is called state dependency analysis where for scheduling an event it might be required to do some computations and make some comparisons to choose an event among different events that could happen. This computation might need a considerable amount of time. For instance when a trip is started from a node and there are some alternative routes to the destination, some intelligent agent is needed, say experience or calculating the journey time of his trip, to assist the choice of an alternative route for the trip. Employing the state dependent scheduling approach along with other justifications already presented for using parallel computer reveals the importance of parallel simulation in tackling this problem. A simplification, which is consistent with the simulation literature, can be made in which probabilities are used to redirect trips into arcs according to some values derived from a distribution function. This method despite its simplicity is not intended to imitate the systems behaviour as it is. This paper offers an analysis of a selected section of the network as shown in Figure 2(b). This paper presents four sets of experiments to evaluate the effects of the parallel program under the state dependency scheduling. The parallel simulation model was run on Cray T3E 1200E, each processor with 256 Mbytes of memory, and a 2 Tbytes high performance disk. This system has a peak performance of just under 700 Gflops and a sustained performance of 122 Gflops on the NAS parallel benchmark. The model was developed in C using the gcc compiler, the MPI library was used for message-passing, and the messages provided synchronisation information between different processors in addition to the model data. A modified CMB algorithm18 is employed where each processor exchanged messages with only those processors which had a common link. Different experiments were performed; each with 50 iterations, 500 entities for warming up and different number of entities, depending on the workload, for data analysis. The model was run using different strategies to measure the performance of parallel execution, and

6. Results and discussions In different ways the simulation model can be paralleled. Four sets of experiments were performed for different workloads including 1000, 2000, 2500, and 3000 trips in the network. Each simulation must contain both normal and earthquake situations in order to consider all time periods. Therefore first the model is run for trips which happen in the normal situation, and then by flowing the trips which are required during and after the earthquake. Apparently the state analysis in these two conditions are different (because the main objective of the trips are different) where in the first case expected travel time is of particular interest, whereas in the second case a balance between safety and travel time is desired. Also during the emergency period the weight of these two factors are changing. The AHP provide a great background to combine these two factors. In this paper the main purpose of the parallel simulation is to understand how state dependency affects the performance measures of the parallel program. These experiments are: •= Each processor maintains a node of the network (i.e. 22 processors) and simulates interactions within the node and with nearby nodes. It is supposed that scheduling a new event in the event list follows the event execution and does not require specific computation, except for the cases where multiple routes exist. In which case, generating random variates from a distribution function will determine the next route of the trip. This is referenced as NoStateDependent strategy. •= Employing no extra processor, where a processor is responsible for both simulation and state analysis. In this case, when a vehicle arrives at a node, the processor performs the simulation until state analysis is required. This is followed by analysis of the state and making a comparison among the potential states in the same processor. This scenario (which will be called NoMoreProcessor) still needs 22 processors (but with more workload) and no extra message is exchanged for state analysis. •= All state dependency computations are allocated to a processor. This processor is called the state dependency analyser. When a vehicle arrives at a node, its corresponding processor sends a

113

•=

message containing its present node, its destination and other required state variables to the state analyser where it will evaluate different routes (according to the problem objective) and return the best route to the sender processor. This enables the simulation program to schedule the next event in the event list. This scenario (which termed as OneExtraProcessor) employs 23 processors. However, an excessive amount of time for state analysis can affect the performance of the simulation. An extra processor is allocated to analyse the state of a node. In this method each simulation processor sends a message including its current state to its private state analyser. Performance of this method depends on the cost of sending extra messages and benefit of paralleling the state analysis. In this case (known as DoubleProcessor) the number of processors is 44.

Studying the communication aspects of parallel simulation is more important in distributed memory, where communication is relatively expensive, compared to tightly coupled shared memory platforms. It is shown that lookahead crucially affects the performance of the conservative protocols since it reduces the number of null messages and allows the processors to be more involved in the events processing. But in the case of the state dependency, improving the lookahead was not expected to yield performance gains due to the degree of computations required for state analysis. As Figure 3 indicates, communication cost for small lookahead is so high that still improving the lookahead reduces the run time. For larger lookahead this is not the case where the run time is limited to the time required for state analysers. Also the figure shows that the number of available trips in the network does not affect this behaviour. This is in contrast with literature in the sense that in more events congested models, improving the lookahead leads to more performance. However, still the behaviour of total network, as shown in Figure 2(a), for different numbers of trips is not known.

Figure 3:Effects of lookahead in Run Time (Scenario:Double) 2.2

3000 1.7

2500

1.2

2000 1500

0.7

1000

0.2 0.01

0.06

0.2

0.7

1

2

Lookahead

Run time and null message are also used to evaluate the performance of different scenarios. To make the results comparable, models with lookahead equal with 1 is used. Figure 4 shows the run time results. In NoStateDependent scenario, running the simulation takes about 1 second. The NoMoreProcessor scenario, where both simulation and state analysis of a node were performed in the same processor, increases the run time to about 70 seconds in average. This is due to more loads on the processors. Allocating all state analysis to a single processor, OneExtraProcessor, increases the run time to just over 10000 seconds representing the enormous load of the state analyser. Separating these two operations and allocating each to a single processor in DoubleProcessor strategy reduces the run time to just over 1 second.

114

Figure 4. Effects of differnet parallelisation scenarios on Run time 100000 10000 1000

NotStateDependent NoMoreProcessor

100

DoubleProcessor OneExtraProcessor

10 1 1000

1500

2000

2500

3000

0.1 Number of trips

Figure 5 represents the communication cost of synchronisation in different scenarios. NotStateDependent and NoMoreProcessor scenarios require the same numbers of null messages. This is clear because the simulation and state analysis of a node are in the same place. In OneExtraProcessor scenario, only communication with the extra processor is added whereas in the DoubleProcessor, communications with state analysis processors must be included. In the DoubleProcessor rate of null messages to extra trips is increased compared to the OneExtraProcessor. For the network of Figure 2(a), this communication cost can limit the number of extra processors that can be used for state analysis.

Figure 5: Effects of different parallelisation on null messages 20000 15000 NotStateDependent NoMoreProcessor

10000

OneExtraProcessor DoubleProcessor

5000 0

Number of trips

7. Conclusions Analysis of different scenarios reveals that allocating both simulation and state analysis to a processor leads to excessive workloads in all processors, which ultimately extends the simulation run time beyond the case with no state analysis. Also computing all state analysis on a processor even makes the run time much longer. This is due to enormous computations that should be performed on this processor, which makes it the bottleneck of the parallel program. Cost-benefit analysis indicates that for a network with the size of our case, it is efficient to employ extra processor to analyse the states of a node. In the other words, the benefit of distributing state analysis justifies the

115

communication cost. For larger networks this would not be the case because more time is needed to analyse the states. One alternative is to employ parallel optimisation techniques along with the parallel simulation to reduce the state analysis time and maintain the simulation run time in the same level as no state analysis.

Acknowledgements The authors wish to thank Dr. Mohammad Moddares for his help in formulating the network problem and Professor Mike Pidd for his comments during the simulation phase of the project. This work is supported by a scholarship from the Iranian ministry of science, research and technology.

8. References 1

Pidd, M. (1998). Computer Simulation in Management Science (4th edition), John Wiley & Sons Ltd, Chichester. 2 Chandak, A. and Browne J. C. (1983). Vectorization of Discrete Event Simulation, Proceedings of the 1983 International Conference on Parallel Processing: 359-361. 3 Biles, W. (1985). Statistical Considerations in Simulation on a Network of Microcomputers, Proceedings of the 1985 Winter Simulation Conference: 388-393. 4 Comfort, J.C. (1984) The Simulation of a Master-Slave Event Set Processor, Simulation: 42(3), 117-124. 5 Zhang, G. and Zeigler, B. P. (1989) DEVS Scheme supported mapping of hierarchical models onto multiple processor system. In Proceedings of the SCS Multiconference on Distributed Simulation: 64-69. 6 Chandy, K. M. and Misra, J. (1978) Distributed Simulation: A case study in design and verification of distributed programs, IEEE Transactions on software engineering: VOL. SE-5 NO. 5, 440-452. 7 Chandy, K. M. and Misra, J. (1981). Asynchronous Distributed Simulation via a Sequence of Parallel Computations. Communications of the ACM: 24(11), 198-206. 8 Lubachevsky, B. D. (1989) Efficient distributed event-driven simulation of multi-loop networks. Commun. ACM: 32, 111-123. 9 Su, W.K., and Seitz, C. L. (1989) Variants of the Chandy-Misra-Bryant distributed discrete-event simulation algorithm. In Proceedings of the SCS Multiconference on Distributed Simulation: 38-43. 10 Ayani, R. (1989)a paralel simulation scheme based on the distance between objects. In proceedings of the SCS Multiconference on Distributed Simulation: 21(2), 113-118. 11 Fujimoto R. M. (2000), Parallel and distributed Simulation Systems, John Wiley. 12 Gafni, A. (1988) Rollback mechanisms for distributed simulation systems. In proceedings of the SCS Multiconference on Distributed Simulation: 61-67. 13 West, D. (1988) Optimizing Time Warp: Lazy rollback and lazy re-evaluation. M.S. thesis, University of Calgary. 14 Sokol, L. M. and Stucky, B. K. (1990) MTW: experimental results for a constrained optimistic scheduling paradigm. In Proceedings of the SCS Multiconference on Distributed Simulation: 169173. 15 Reiher, P. L. Wieland, F. and Jefferson, D. R. (1989) Limitation of optimism in the Time Warp Operating System. Proceeding of 1989 Winter Simulation Conference: 765-770. 16 Das S. R. (2000) Adaptive protocols for parallel discrete event simulation, Journal of the operational Research Society: 51(4) 385-394. 17 Modarres Mohammad and Zarei Behrouz (2000) Application of Network Theory and AHP in Urban Transportation to Minimize Earthquake Damages, under revision for publication in the European Journal of Operations Research.

116

Discrete event simulation on Cray T3E: A progress ... - Semantic Scholar

Discrete event simulation on Cray T3E: A progress ... - Semantic Scholar

Suggest Documents

Discrete Event Simulation - Semantic Scholar

Cray T3E User's Guide

Discrete Event Simulation 1 - Semantic Scholar

Cray T3E User's Guide - Parallel.ru

Discrete-event Simulation SoftwareDecision ... - Semantic Scholar

Discrete-event Simulation SoftwareDecision ... - Semantic Scholar

Discrete Event Simulation in Java - Semantic Scholar

Discrete event simulation-based performance ... - Semantic Scholar

A Robust Preconditioner on the CRAY-T3E for Large Nonsymmetric ...

A Discrete Event Modeling and Simulation of Wave ... - Semantic Scholar

2003: a discrete event simulation for the crew ... - Semantic Scholar

A Virtual Processor Discrete Event Simulation Tool ... - Semantic Scholar

2003: a discrete event simulation for the crew ... - Semantic Scholar

Discrete Event System Simulation

Discrete Event Simulation 1

Discrete Event System Simulation

Performance of MPI on the CRAY T3E-512

0 Efficient Parallel Discrete Event Simulation on

Discrete Event Simulation Framework for Power ... - Semantic Scholar

2004: Inside Discrete-Event Simulation Software ... - Semantic Scholar

Integration of Discrete Event Simulation with an ... - Semantic Scholar

utilization of discrete event simulation in the ... - Semantic Scholar

Discrete Event Simulation Enabled High Level ... - Semantic Scholar

process-oriented discrete-event simulation in java ... - Semantic Scholar